googlegeminipricingcomparison

Gemini 3 API Pricing: Ultra, Pro, Flash & Flash-Lite Compared (2026)

Google's Gemini 3 series is here — Ultra, Pro, Flash, and Flash-Lite. Full pricing breakdown, how each model compares to Gemini 2.5, and which to use for your workload.

TTokenCost Editorial·LLM Cost Research·Updated 2026-05-206 min read

Google announced the Gemini 3 series in May 2026, bringing a new tier structure: Ultra (frontier reasoning), Pro (balanced production), Flash (speed-optimized), and Flash-Lite (budget). This guide breaks down every pricing tier, compares Gemini 3 to its predecessors, and tells you which model to use for each workload.

Gemini 3 Pricing at a Glance

ModelInput /1MOutput /1MCached /1MContext
Gemini 3 Ultra$10$30$2.52M
Gemini 3 Pro$3.5$14$0.8751M
Gemini 3 Flash$0.5$2$0.1251M
Gemini 3 Flash-Lite$0.12$0.481M

What Changed from Gemini 2.5?

Gemini 3 Pro ($3.50/1M input) costs 2.8x more than Gemini 2.5 Pro ($1.25/1M input) but delivers benchmark improvements across reasoning, coding, and multimodal tasks. For pure cost-sensitive workloads where Gemini 2.5 Pro already works well, upgrading is not automatically justified — but for tasks that were hitting quality limits, the improvement is meaningful.

Gemini 2.5 Pro
$1.25/1M
input · perf score 84
Gemini 3 Pro (NEW)
$3.50/1M
input · perf score 90 (+7%)

Gemini 3 Ultra: Google's New Frontier Model

At $10/1M input and $30/1M output, Gemini 3 Ultra enters the premium tier alongside GPT-5 and Claude Opus 4.7. The 2M token context window is the largest of any commercial frontier model — equivalent to roughly 1,500,000 words or multiple large codebases in a single prompt.

Prompt caching is available at $2.50/1M — a 75% discount. For any workload with repeated large context (document analysis suites, multi-file coding agents), this brings the effective cost significantly closer to GPT-5 and Claude Opus.

ModelInput /1MOutput /1MContextPerf Score
Gemini 3 Ultra$10$302M97
GPT-5$8$321M96
Claude Opus 4.7$5$251M92

Gemini 3 Flash: The New Default for Most Teams

Gemini 3 Flash at $0.50/1M input is 67% more expensive than Gemini 2.5 Flash ($0.30/1M) but brings a meaningfully higher capability floor — particularly on structured output, code generation, and instruction following. For teams already using Gemini 2.5 Flash for production workloads, the upgrade math depends on your quality requirements.

At 50,000 requests/day with 1,000 input + 400 output tokens, moving from Gemini 2.5 Flash to Gemini 3 Flash adds approximately $300/month in cost. If that buys fewer retries and less post-processing, it can be cost-neutral or better.

Flash-Lite: Still the Cheapest 1M Context Option

Gemini 3 Flash-Lite at $0.12/1M input is among the cheapest production-quality models with a 1M context window. It outperforms Gemini 2.5 Flash-Lite on most benchmarks while maintaining the same price tier. For bulk classification, data enrichment, and high-volume summarization that doesn't need frontier-level quality, it's the strongest budget option in the Gemini 3 family.

Which Gemini 3 Model Should You Use?

Frontier reasoning, complex agents, large codebase analysis
$10/1M — use only when quality demands it
Gemini 3 Ultra
Production LLM apps, RAG pipelines, balanced workloads
$3.50/1M — best quality-to-cost in the series
Gemini 3 Pro
Real-time chatbots, summarization, code completion
$0.50/1M — fast with strong instruction following
Gemini 3 Flash
Bulk classification, data enrichment, high-volume extraction
$0.12/1M — cheapest capable Gemini 3 option
Gemini 3 Flash-Lite

Prompt Caching with Gemini 3

All Gemini 3 models except Flash-Lite support prompt caching. The cache discount is consistently 75% across the series — higher than Anthropic's 90% but in line with OpenAI's 50%. For workloads where system prompts or document context exceeds 1,000 tokens and repeats across requests, caching remains the single highest-leverage optimization.

See the full guide: Prompt Caching: Save Up to 90% on LLM API Costs →

Bottom Line

Gemini 3 is a genuine step up from 2.5 — but whether to upgrade depends on your workload. For cost-sensitive high-volume applications already running well on Gemini 2.5 Flash, the upgrade is optional. For quality-critical workloads hitting Gemini 2.5's limits, Gemini 3 Pro or Ultra is worth the cost increase.

Use our token cost calculator to compare Gemini 3 exact costs for your token volumes, or see the Gemini 3 Pro vs 2.5 Pro full comparison →

Related Articles

Cheapest LLM API in 2026: Full Price Comparison
We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.
8 min read
GPT vs Claude vs Gemini: Pricing & Performance in 2026
A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads.
7 min read
DeepSeek API Pricing Guide 2026: R1 vs Chat
How DeepSeek R1 and Chat pricing compares to GPT-4o and Claude Sonnet — and when it makes sense to switch for your workload.
5 min read