Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series
Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google today announced the launch of Gemini 3.1 Flash-Lite, the fastest and most cost-effective model in the Gemini 3 series. Designed for high-volume workloads, the new model delivers best-in-class intelligence at a fraction of the cost of larger models.
Context
The Gemini Flash series has been extremely popular among developers since its launch, with models like Gemini 2 and 2.5 Flash processing trillions of tokens across hundreds of thousands of applications built by millions of developers. The new 3.1 Flash-Lite continues this tradition, offering an exceptional balance of performance, speed, and cost.
Launch Details
Pricing and Availability
Gemini 3.1 Flash-Lite is available in preview for developers through the Gemini API in Google AI Studio and for enterprises via Vertex AI, with highly competitive pricing:
- Input: $0.25 per 1M tokens
- Output: $1.50 per 1M tokens
Performance and Benchmarks
Despite the reduced price, 3.1 Flash-Lite delivers impressive performance:
- Elo Score of 1432 on the Arena.ai Leaderboard
- 86.9% on GPQA Diamond (PhD-level reasoning benchmark)
- 76.8% on MMMU Pro (multimodal understanding)
- 2.5x faster Time to First Answer Token compared to 2.5 Flash
- 45% faster output speed
The model outperforms Gemini 2.5 Flash in quality while maintaining significantly lower costs, setting a new standard for cost-effectiveness in language models.
Key Features
3.1 Flash-Lite comes with built-in thinking levels, allowing developers to control how much the model “thinks” for a specific task. This is critical for managing high-frequency workloads.
Ideal use cases:
- High-volume translation
- Content moderation
- User interface generation
- Creating simulations
- Following complex instructions
Companies Already Adopting
Companies like Latitude, Cartwheel, and Whering are already using 3.1 Flash-Lite to solve complex problems at scale. Early testers highlighted the model’s efficiency and reasoning capabilities, noting that it can handle complex inputs with the precision of higher-tier models.
Flash Series Evolution
The Flash series continues to be the most popular Gemini version. Gemini 3 Flash (launched in December 2025) already offered frontier intelligence with Flash-level speed, and 3.1 Flash-Lite represents the pinnacle of cost-efficiency in the series.
The complete series now includes:
- Gemini 3 Pro – For more complex tasks
- Gemini 3 Flash – Frontier intelligence built for speed
- Gemini 3.1 Flash-Lite – Maximum efficiency for high volume