googlegeminipricingcomparison

Gemini 3 API Pricing: Ultra, Pro, Flash & Flash-Lite Compared (2026)

Google's Gemini 3 series is here — Ultra, Pro, Flash, and Flash-Lite. Full pricing breakdown, how each model compares to Gemini 2.5, and which to use for your workload.

TTokenCost Editorial·LLM Cost Research·Updated 2026-05-206 min read

Google announced the Gemini 3 series in May 2026, bringing a new tier structure: Ultra (frontier reasoning), Pro (balanced production), Flash (speed-optimized), and Flash-Lite (budget). This guide breaks down every pricing tier, compares Gemini 3 to its predecessors, and tells you which model to use for each workload.

Gemini 3 Pricing at a Glance

Model	Input /1M	Output /1M	Cached /1M	Context
Gemini 3 Ultra	$10	$30	$2.5	2M
Gemini 3 Pro	$3.5	$14	$0.875	1M
Gemini 3 Flash	$0.5	$2	$0.125	1M
Gemini 3 Flash-Lite	$0.12	$0.48	—	1M

What Changed from Gemini 2.5?

Gemini 3 Pro ($3.50/1M input) costs 2.8x more than Gemini 2.5 Pro ($1.25/1M input) but delivers benchmark improvements across reasoning, coding, and multimodal tasks. For pure cost-sensitive workloads where Gemini 2.5 Pro already works well, upgrading is not automatically justified — but for tasks that were hitting quality limits, the improvement is meaningful.

Gemini 2.5 Pro

$1.25/1M

input · perf score 84

Gemini 3 Pro (NEW)

$3.50/1M

input · perf score 90 (+7%)

Gemini 3 Ultra: Google's New Frontier Model

At $10/1M input and $30/1M output, Gemini 3 Ultra enters the premium tier alongside GPT-5 and Claude Opus 4.7. The 2M token context window is the largest of any commercial frontier model — equivalent to roughly 1,500,000 words or multiple large codebases in a single prompt.

Prompt caching is available at $2.50/1M — a 75% discount. For any workload with repeated large context (document analysis suites, multi-file coding agents), this brings the effective cost significantly closer to GPT-5 and Claude Opus.

Model	Input /1M	Output /1M	Context	Perf Score
Gemini 3 Ultra	$10	$30	2M	97
GPT-5	$8	$32	1M	96
Claude Opus 4.7	$5	$25	1M	92

Gemini 3 Flash: The New Default for Most Teams

Gemini 3 Flash at $0.50/1M input is 67% more expensive than Gemini 2.5 Flash ($0.30/1M) but brings a meaningfully higher capability floor — particularly on structured output, code generation, and instruction following. For teams already using Gemini 2.5 Flash for production workloads, the upgrade math depends on your quality requirements.

At 50,000 requests/day with 1,000 input + 400 output tokens, moving from Gemini 2.5 Flash to Gemini 3 Flash adds approximately $300/month in cost. If that buys fewer retries and less post-processing, it can be cost-neutral or better.

Flash-Lite: Still the Cheapest 1M Context Option

Gemini 3 Flash-Lite at $0.12/1M input is among the cheapest production-quality models with a 1M context window. It outperforms Gemini 2.5 Flash-Lite on most benchmarks while maintaining the same price tier. For bulk classification, data enrichment, and high-volume summarization that doesn't need frontier-level quality, it's the strongest budget option in the Gemini 3 family.

Which Gemini 3 Model Should You Use?

Frontier reasoning, complex agents, large codebase analysis

$10/1M — use only when quality demands it

Gemini 3 Ultra

Production LLM apps, RAG pipelines, balanced workloads

$3.50/1M — best quality-to-cost in the series

Gemini 3 Pro

Real-time chatbots, summarization, code completion

$0.50/1M — fast with strong instruction following

Gemini 3 Flash

Bulk classification, data enrichment, high-volume extraction

$0.12/1M — cheapest capable Gemini 3 option

Gemini 3 Flash-Lite

Prompt Caching with Gemini 3

All Gemini 3 models except Flash-Lite support prompt caching. The cache discount is consistently 75% across the series — higher than Anthropic's 90% but in line with OpenAI's 50%. For workloads where system prompts or document context exceeds 1,000 tokens and repeats across requests, caching remains the single highest-leverage optimization.

See the full guide: Prompt Caching: Save Up to 90% on LLM API Costs →

Bottom Line

Gemini 3 is a genuine step up from 2.5 — but whether to upgrade depends on your workload. For cost-sensitive high-volume applications already running well on Gemini 2.5 Flash, the upgrade is optional. For quality-critical workloads hitting Gemini 2.5's limits, Gemini 3 Pro or Ultra is worth the cost increase.

Use our token cost calculator to compare Gemini 3 exact costs for your token volumes, or see the Gemini 3 Pro vs 2.5 Pro full comparison →

Cheapest LLM API in 2026: Full Price Comparison

We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.

8 min read

GPT vs Claude vs Gemini: Pricing & Performance in 2026

A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads.

7 min read

DeepSeek API Pricing Guide 2026: R1 vs Chat

How DeepSeek R1 and Chat pricing compares to GPT-4o and Claude Sonnet — and when it makes sense to switch for your workload.

5 min read