Gemini 3 API Pricing: Ultra, Pro, Flash & Flash-Lite Compared (2026)
Google's Gemini 3 series is here — Ultra, Pro, Flash, and Flash-Lite. Full pricing breakdown, how each model compares to Gemini 2.5, and which to use for your workload.
Google announced the Gemini 3 series in May 2026, bringing a new tier structure: Ultra (frontier reasoning), Pro (balanced production), Flash (speed-optimized), and Flash-Lite (budget). This guide breaks down every pricing tier, compares Gemini 3 to its predecessors, and tells you which model to use for each workload.
Gemini 3 Pricing at a Glance
What Changed from Gemini 2.5?
Gemini 3 Pro ($3.50/1M input) costs 2.8x more than Gemini 2.5 Pro ($1.25/1M input) but delivers benchmark improvements across reasoning, coding, and multimodal tasks. For pure cost-sensitive workloads where Gemini 2.5 Pro already works well, upgrading is not automatically justified — but for tasks that were hitting quality limits, the improvement is meaningful.
Gemini 3 Ultra: Google's New Frontier Model
At $10/1M input and $30/1M output, Gemini 3 Ultra enters the premium tier alongside GPT-5 and Claude Opus 4.7. The 2M token context window is the largest of any commercial frontier model — equivalent to roughly 1,500,000 words or multiple large codebases in a single prompt.
Prompt caching is available at $2.50/1M — a 75% discount. For any workload with repeated large context (document analysis suites, multi-file coding agents), this brings the effective cost significantly closer to GPT-5 and Claude Opus.
Gemini 3 Flash: The New Default for Most Teams
Gemini 3 Flash at $0.50/1M input is 67% more expensive than Gemini 2.5 Flash ($0.30/1M) but brings a meaningfully higher capability floor — particularly on structured output, code generation, and instruction following. For teams already using Gemini 2.5 Flash for production workloads, the upgrade math depends on your quality requirements.
At 50,000 requests/day with 1,000 input + 400 output tokens, moving from Gemini 2.5 Flash to Gemini 3 Flash adds approximately $300/month in cost. If that buys fewer retries and less post-processing, it can be cost-neutral or better.
Flash-Lite: Still the Cheapest 1M Context Option
Gemini 3 Flash-Lite at $0.12/1M input is among the cheapest production-quality models with a 1M context window. It outperforms Gemini 2.5 Flash-Lite on most benchmarks while maintaining the same price tier. For bulk classification, data enrichment, and high-volume summarization that doesn't need frontier-level quality, it's the strongest budget option in the Gemini 3 family.
Which Gemini 3 Model Should You Use?
Prompt Caching with Gemini 3
All Gemini 3 models except Flash-Lite support prompt caching. The cache discount is consistently 75% across the series — higher than Anthropic's 90% but in line with OpenAI's 50%. For workloads where system prompts or document context exceeds 1,000 tokens and repeats across requests, caching remains the single highest-leverage optimization.
See the full guide: Prompt Caching: Save Up to 90% on LLM API Costs →
Bottom Line
Gemini 3 is a genuine step up from 2.5 — but whether to upgrade depends on your workload. For cost-sensitive high-volume applications already running well on Gemini 2.5 Flash, the upgrade is optional. For quality-critical workloads hitting Gemini 2.5's limits, Gemini 3 Pro or Ultra is worth the cost increase.
Use our token cost calculator to compare Gemini 3 exact costs for your token volumes, or see the Gemini 3 Pro vs 2.5 Pro full comparison →