Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google today announced the launch of Gemini 3.1 Flash-Lite, the fastest and most cost-effective model in the Gemini 3 series. Designed for high-volume workloads, the new model delivers best-in-class intelligence at a fraction of the cost of larger models.

Context

The Gemini Flash series has been extremely popular among developers since its launch, with models like Gemini 2 and 2.5 Flash processing trillions of tokens across hundreds of thousands of applications built by millions of developers. The new 3.1 Flash-Lite continues this tradition, offering an exceptional balance of performance, speed, and cost.

Launch Details

Pricing and Availability

Gemini 3.1 Flash-Lite is available in preview for developers through the Gemini API in Google AI Studio and for enterprises via Vertex AI, with highly competitive pricing:

  • Input: $0.25 per 1M tokens
  • Output: $1.50 per 1M tokens

Performance and Benchmarks

Despite the reduced price, 3.1 Flash-Lite delivers impressive performance:

  • Elo Score of 1432 on the Arena.ai Leaderboard
  • 86.9% on GPQA Diamond (PhD-level reasoning benchmark)
  • 76.8% on MMMU Pro (multimodal understanding)
  • 2.5x faster Time to First Answer Token compared to 2.5 Flash
  • 45% faster output speed

The model outperforms Gemini 2.5 Flash in quality while maintaining significantly lower costs, setting a new standard for cost-effectiveness in language models.

Key Features

3.1 Flash-Lite comes with built-in thinking levels, allowing developers to control how much the model “thinks” for a specific task. This is critical for managing high-frequency workloads.

Ideal use cases:

  • High-volume translation
  • Content moderation
  • User interface generation
  • Creating simulations
  • Following complex instructions

Companies Already Adopting

Companies like Latitude, Cartwheel, and Whering are already using 3.1 Flash-Lite to solve complex problems at scale. Early testers highlighted the model’s efficiency and reasoning capabilities, noting that it can handle complex inputs with the precision of higher-tier models.

Flash Series Evolution

The Flash series continues to be the most popular Gemini version. Gemini 3 Flash (launched in December 2025) already offered frontier intelligence with Flash-level speed, and 3.1 Flash-Lite represents the pinnacle of cost-efficiency in the series.

The complete series now includes:

  • Gemini 3 Pro – For more complex tasks
  • Gemini 3 Flash – Frontier intelligence built for speed
  • Gemini 3.1 Flash-Lite – Maximum efficiency for high volume

Sources

Translations: