Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Mar 3, 2026 · 2 min read · google gemini flash-lite ai language-model ·

Share on:

Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google today announced the launch of Gemini 3.1 Flash-Lite, the fastest and most cost-effective model in the Gemini 3 series. Designed for high-volume workloads, the new model delivers best-in-class intelligence at a fraction of the cost of larger models.

Context

The Gemini Flash series has been extremely popular among developers since its launch, with models like Gemini 2 and 2.5 Flash processing trillions of tokens across hundreds of thousands of applications built by millions of developers. The new 3.1 Flash-Lite continues this tradition, offering an exceptional balance of performance, speed, and cost.

Launch Details

Pricing and Availability

Gemini 3.1 Flash-Lite is available in preview for developers through the Gemini API in Google AI Studio and for enterprises via Vertex AI, with highly competitive pricing:

Input: $0.25 per 1M tokens
Output: $1.50 per 1M tokens

Performance and Benchmarks

Despite the reduced price, 3.1 Flash-Lite delivers impressive performance:

Elo Score of 1432 on the Arena.ai Leaderboard
86.9% on GPQA Diamond (PhD-level reasoning benchmark)
76.8% on MMMU Pro (multimodal understanding)
2.5x faster Time to First Answer Token compared to 2.5 Flash
45% faster output speed

The model outperforms Gemini 2.5 Flash in quality while maintaining significantly lower costs, setting a new standard for cost-effectiveness in language models.

Key Features

3.1 Flash-Lite comes with built-in thinking levels, allowing developers to control how much the model “thinks” for a specific task. This is critical for managing high-frequency workloads.

Ideal use cases:

High-volume translation
Content moderation
User interface generation
Creating simulations
Following complex instructions

Companies Already Adopting

Companies like Latitude, Cartwheel, and Whering are already using 3.1 Flash-Lite to solve complex problems at scale. Early testers highlighted the model’s efficiency and reasoning capabilities, noting that it can handle complex inputs with the precision of higher-tier models.

Flash Series Evolution

The Flash series continues to be the most popular Gemini version. Gemini 3 Flash (launched in December 2025) already offered frontier intelligence with Flash-level speed, and 3.1 Flash-Lite represents the pinnacle of cost-efficiency in the series.

The complete series now includes:

Gemini 3 Pro – For more complex tasks
Gemini 3 Flash – Frontier intelligence built for speed
Gemini 3.1 Flash-Lite – Maximum efficiency for high volume

Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Google Launches Gemini 3.1 Flash-Lite: The Most Cost-Effective Model in the Gemini 3 Series

Context

Launch Details

Pricing and Availability

Performance and Benchmarks

Key Features

Companies Already Adopting

Flash Series Evolution

Sources

Translations: