# TokenCost — LLM API Pricing Calculator > Real-time LLM API pricing data for 53 models across 8 providers. Compare token costs, context windows, and performance scores. Free calculator for developers. TokenCost tracks exact API pricing (input/1M tokens, output/1M tokens, cached/1M tokens) for every major LLM provider. Data is updated weekly. All prices are in USD. ## Tools - [Token Cost Calculator](https://tokencostcalculators.com/): Interactive calculator — enter token counts, get exact monthly cost for any model - [Model Comparison Tool](https://tokencostcalculators.com/compare/): Side-by-side comparison of any two LLM models on price and performance - [Token Counter](https://tokencostcalculators.com/token-counter/): Count tokens for any text using GPT-4 / Claude tokenizers - [Cost Routing Calculator](https://tokencostcalculators.com/cost-routing/): Model what you save by routing requests to cheaper models - [Image Pricing Calculator](https://tokencostcalculators.com/image-pricing/): DALL-E, Stable Diffusion image generation cost calculator - [All Models Index](https://tokencostcalculators.com/models/): Full list of all tracked LLM models with pricing ## Benchmark Pages - [Cheapest LLM API 2026](https://tokencostcalculators.com/cheapest-llm-api/): Ranked list of cheapest LLMs by input price, output price, and use case - [Best Value LLM 2026](https://tokencostcalculators.com/best-value-llm-2026/): Performance-per-dollar ranking across all tracked models - [100k Token Cost Comparison](https://tokencostcalculators.com/100k-tokens-cost/): What 100,000 tokens costs on every model - [1M Token Cost Comparison](https://tokencostcalculators.com/1m-tokens-cost/): What 1 million tokens costs on every model ## Provider Calculators - [OpenAI API Cost Calculator](https://tokencostcalculators.com/openai-api-cost-calculator/): GPT-5, GPT-4.1, GPT-4o, o3, o4-mini pricing - [Anthropic Claude API Cost Calculator](https://tokencostcalculators.com/anthropic-api-cost-calculator/): Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 pricing - [Google Gemini API Cost Calculator](https://tokencostcalculators.com/google-gemini-api-cost-calculator/): Gemini 3 Ultra/Pro/Flash, Gemini 2.5 Pro/Flash pricing - [DeepSeek API Cost Calculator](https://tokencostcalculators.com/deepseek-api-cost-calculator/): DeepSeek R2, R1, Chat pricing - [Mistral API Cost Calculator](https://tokencostcalculators.com/mistral-api-cost-calculator/): Magistral, Mistral Large, Codestral pricing - [Meta Llama API Cost Calculator](https://tokencostcalculators.com/meta-llama-api-cost-calculator/): Llama 4 Maverick, Scout, Llama 3.x pricing - [xAI Grok API Cost Calculator](https://tokencostcalculators.com/xai-grok-api-cost-calculator/): Grok 3, Grok 3 Fast, Grok 3 Mini pricing - [Qwen API Cost Calculator](https://tokencostcalculators.com/qwen-api-cost-calculator/): Qwen3 235B, Qwen3.5 Flash pricing ## Current LLM API Pricing Data All prices in USD per 1 million tokens. Updated May 2026. ### OpenAI - **GPT-5** (gpt-5): input $8/1M, output $32/1M, cached input $4/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-5/) - **GPT-5 Mini** (gpt-5-mini): input $0.6/1M, output $2.4/1M, cached input $0.3/1M, context 512k — [details](https://tokencostcalculators.com/models/gpt-5-mini/) - **GPT-4.1** (gpt-4.1): input $2/1M, output $8/1M, cached input $1/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1/) - **GPT-4.1 Mini** (gpt-4.1-mini): input $0.4/1M, output $1.6/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1-mini/) - **GPT-4.1 Nano** (gpt-4.1-nano): input $0.1/1M, output $0.4/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1-nano/) - **o4-mini** (o4-mini): input $1.1/1M, output $4.4/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o4-mini/) - **o3** (o3): input $0.4/1M, output $1.6/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o3/) - **o1** (o1): input $15/1M, output $60/1M, cached input $7.5/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o1/) - **GPT-4o** (gpt-4o): input $2.5/1M, output $10/1M, cached input $1.25/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4o/) - **GPT-4o Mini** (gpt-4o-mini): input $0.15/1M, output $0.6/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4o-mini/) - **GPT-4 Turbo** (gpt-4-turbo): input $10/1M, output $30/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4-turbo/) - **GPT-3.5 Turbo** (gpt-3-5-turbo): input $0.5/1M, output $1.5/1M, context 16k — [details](https://tokencostcalculators.com/models/gpt-3-5-turbo/) ### Anthropic - **Claude Fable 5** (claude-fable-5): input $10/1M, output $50/1M, cached input $1/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-fable-5/) - **Claude Opus 4.8** (claude-opus-4-8): input $5/1M, output $25/1M, cached input $0.5/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-8/) - **Claude Opus 4.7** (claude-opus-4-7): input $5/1M, output $25/1M, cached input $0.5/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-7/) - **Claude Sonnet 4.6** (claude-sonnet-4-6): input $3/1M, output $15/1M, cached input $0.3/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-sonnet-4-6/) - **Claude Opus 4.6** (claude-opus-4-6): input $5/1M, output $25/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-6/) - **Claude Opus 4.5** (claude-opus-4-5): input $15/1M, output $75/1M, cached input $1.5/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-opus-4-5/) - **Claude Haiku 4.5** (claude-haiku-4-5-20251001): input $1/1M, output $5/1M, cached input $0.1/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-haiku-4-5-20251001/) - **Claude 3.5 Sonnet** (claude-3-5-sonnet-20241022): input $3/1M, output $15/1M, cached input $0.3/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-5-sonnet-20241022/) - **Claude 3.5 Haiku** (claude-3-5-haiku-20241022): input $0.8/1M, output $4/1M, cached input $0.08/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-5-haiku-20241022/) - **Claude 3 Opus** (claude-3-opus-20240229): input $15/1M, output $75/1M, cached input $1.5/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-opus-20240229/) ### Google - **Gemini 3 Ultra** (gemini-3-ultra): input $10/1M, output $30/1M, cached input $2.5/1M, context 2M — [details](https://tokencostcalculators.com/models/gemini-3-ultra/) - **Gemini 3 Pro** (gemini-3-pro): input $3.5/1M, output $14/1M, cached input $0.875/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-pro/) - **Gemini 3 Flash** (gemini-3-flash): input $0.5/1M, output $2/1M, cached input $0.125/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-flash/) - **Gemini 3 Flash-Lite** (gemini-3-flash-lite): input $0.12/1M, output $0.48/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-flash-lite/) - **Gemini 2.5 Pro** (gemini-2.5-pro): input $1.25/1M, output $10/1M, cached input $0.31/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-pro/) - **Gemini 2.5 Flash** (gemini-2.5-flash): input $0.3/1M, output $2.5/1M, cached input $0.075/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-flash/) - **Gemini 2.5 Flash-Lite** (gemini-2.5-flash-lite): input $0.1/1M, output $0.4/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-flash-lite/) - **Gemini 2.0 Flash** (gemini-2.0-flash): input $0.1/1M, output $0.4/1M, cached input $0.025/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.0-flash/) - **Gemini 2.0 Flash-Lite** (gemini-2.0-flash-lite): input $0.075/1M, output $0.3/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.0-flash-lite/) - **Gemini 1.5 Pro** (gemini-1.5-pro): input $1.25/1M, output $5/1M, cached input $0.31/1M, context 2M — [details](https://tokencostcalculators.com/models/gemini-1.5-pro/) ### Mistral - **Magistral Medium** (magistral-medium): input $2/1M, output $5/1M, context 128k — [details](https://tokencostcalculators.com/models/magistral-medium/) - **Mistral Large 3** (mistral-large-3): input $0.5/1M, output $1.5/1M, context 256k — [details](https://tokencostcalculators.com/models/mistral-large-3/) - **Mistral Medium 3** (mistral-medium-3): input $0.4/1M, output $2/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-medium-3/) - **Mistral Small 3.1** (mistral-small-3-1): input $0.1/1M, output $0.3/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-small-3-1/) - **Codestral** (codestral-2501): input $0.3/1M, output $0.9/1M, context 256k — [details](https://tokencostcalculators.com/models/codestral-2501/) - **Mistral Nemo** (mistral-nemo): input $0.1/1M, output $0.3/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-nemo/) ### DeepSeek - **DeepSeek R2** (deepseek-r2): input $0.8/1M, output $3.2/1M, cached input $0.2/1M, context 128k [reasoning model] — [details](https://tokencostcalculators.com/models/deepseek-r2/) - **DeepSeek R1** (deepseek-r1): input $0.55/1M, output $2.19/1M, cached input $0.14/1M, context 64k [reasoning model] — [details](https://tokencostcalculators.com/models/deepseek-r1/) - **DeepSeek Chat** (deepseek-chat): input $0.27/1M, output $1.1/1M, cached input $0.07/1M, context 64k — [details](https://tokencostcalculators.com/models/deepseek-chat/) ### Meta - **Llama 4 Maverick** (llama-4-maverick): input $0.5/1M, output $1.1/1M, context 1M — [details](https://tokencostcalculators.com/models/llama-4-maverick/) - **Llama 4 Scout** (llama-4-scout): input $0.17/1M, output $0.17/1M, context 10M — [details](https://tokencostcalculators.com/models/llama-4-scout/) - **Llama 3.1 405B** (llama-3-1-405b): input $3.5/1M, output $3.5/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-1-405b/) - **Llama 3.3 70B** (llama-3-3-70b): input $0.23/1M, output $0.4/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-3-70b/) - **Llama 3.1 8B** (llama-3-1-8b): input $0.02/1M, output $0.05/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-1-8b/) ### xAI - **Grok 3** (grok-3): input $3/1M, output $15/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3/) - **Grok 3 Fast** (grok-3-fast): input $5/1M, output $25/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3-fast/) - **Grok 3 Mini** (grok-3-mini): input $0.3/1M, output $0.5/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3-mini/) ### Qwen (Alibaba) - **Qwen3.5 Flash** (qwen3-5-flash): input $0.01/1M, output $0.05/1M, context 256k — [details](https://tokencostcalculators.com/models/qwen3-5-flash/) - **Qwen3 235B** (qwen3-235b): input $0.06/1M, output $0.06/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-235b/) - **Qwen3 30B** (qwen3-30b): input $0.1/1M, output $0.15/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-30b/) - **Qwen3 8B** (qwen3-8b): input $0.05/1M, output $0.1/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-8b/) ## Blog — LLM Cost Guides - [Cheapest LLM API in 2026: Full Price Comparison](https://tokencostcalculators.com/blog/cheapest-llm-api-2026/): We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning. - [7 Ways to Reduce Your OpenAI API Cost by 80%](https://tokencostcalculators.com/blog/reduce-openai-api-cost/): Practical techniques to dramatically cut your OpenAI API bill: prompt caching, model routing, batch API, and token optimization strategies. - [GPT vs Claude vs Gemini: Pricing & Performance in 2026](https://tokencostcalculators.com/blog/gpt-vs-claude-vs-gemini-pricing-2026/): A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads. - [Prompt Caching: Save Up to 90% on LLM API Costs](https://tokencostcalculators.com/blog/prompt-caching-guide/): Everything you need to know about prompt caching across Anthropic, OpenAI, and Google — how it works, when to use it, and how much you save. - [DeepSeek API Pricing Guide 2026: R1 vs Chat](https://tokencostcalculators.com/blog/deepseek-api-cost-guide/): How DeepSeek R1 and Chat pricing compares to GPT-4o and Claude Sonnet — and when it makes sense to switch for your workload. - [Mistral API Pricing Guide 2026: Magistral, Large & Codestral Compared](https://tokencostcalculators.com/blog/mistral-api-pricing-guide/): Complete pricing breakdown for all Mistral AI models — Magistral reasoning, Codestral for code, Mistral Large vs GPT-4o, and EU data residency options. - [Llama 4 API Cost Guide: Maverick vs Scout vs Self-Hosting](https://tokencostcalculators.com/blog/llama-4-api-cost-guide/): Meta Llama 4 pricing explained — Maverick vs Scout, hosted API vs self-hosting economics, and when Llama 3.1 8B is still the cheapest capable option. - [Gemini 2.5 Pro vs GPT-4o: Pricing & Performance in 2026](https://tokencostcalculators.com/blog/gemini-2-5-pro-vs-gpt-4o-cost/): Detailed cost comparison of Google Gemini 2.5 Pro vs OpenAI GPT-4o — monthly pricing at scale, where each model wins, and when to use Gemini 2.5 Flash instead. - [Best LLM API for Chatbots in 2026: Cost vs Quality Breakdown](https://tokencostcalculators.com/blog/best-llm-api-for-chatbot-2026/): Which LLM API should you use for your chatbot? We compare cost, quality, and context window for customer support, RAG, and high-volume use cases. - [Claude API Pricing 2026: Every Model, Every Tier Explained](https://tokencostcalculators.com/blog/claude-api-pricing-2026/): Complete guide to Anthropic Claude API pricing — Opus, Sonnet, and Haiku tiers, prompt caching discounts, and how Claude compares to GPT-4o at scale. - [GPT-4o Mini vs Claude Haiku 4.5: Cost & Quality Comparison 2026](https://tokencostcalculators.com/blog/gpt-4o-mini-vs-claude-haiku-cost/): Head-to-head comparison of the two most popular small LLM APIs — pricing, performance, caching advantages, and which to choose for your use case. - [8 Proven Ways to Reduce LLM API Costs by 60–90%](https://tokencostcalculators.com/blog/how-to-reduce-llm-api-costs/): Practical techniques to dramatically cut your LLM API bill: model routing, prompt caching, batch API, output control, and provider switching strategies. - [OpenAI o3 vs DeepSeek R1: Reasoning Model Cost Comparison 2026](https://tokencostcalculators.com/blog/o3-vs-deepseek-r1-cost/): How much cheaper is DeepSeek R1 than o3? Benchmark scores, monthly cost at scale, and which reasoning model to choose for your workload. - [LLM Tokens Explained: What They Are and How They Affect Your API Bill](https://tokencostcalculators.com/blog/llm-tokens-explained/): What is a token, how many tokens is your content, and exactly how does token count translate to API cost? Everything developers need to know. - [OpenAI Batch API: How to Cut Costs by 50% on Bulk Requests](https://tokencostcalculators.com/blog/openai-batch-api-cost-savings/): A practical guide to OpenAI's Batch API — how it works, which models support it, real savings calculations, and how to combine it with prompt caching for maximum cost reduction. - [Gemini 3 API Pricing: Ultra, Pro, Flash & Flash-Lite Compared (2026)](https://tokencostcalculators.com/blog/gemini-3-api-pricing-2026/): Google's Gemini 3 series is here — Ultra, Pro, Flash, and Flash-Lite. Full pricing breakdown, how each model compares to Gemini 2.5, and which to use for your workload. - [GPT-5 API Pricing: Is It Worth 4x the Cost of GPT-4.1?](https://tokencostcalculators.com/blog/gpt-5-api-pricing-2026/): OpenAI's GPT-5 is out at $8/1M input — 4x more than GPT-4.1. We break down when the upgrade is worth it, how GPT-5 Mini competes, and what this means for your monthly bill. - [DeepSeek R2 vs R1: What Changed, and Is It Worth Switching?](https://tokencostcalculators.com/blog/deepseek-r2-vs-r1-cost/): DeepSeek R2 is faster, smarter, and has 2x the context window of R1 — at $0.80/1M vs $0.55/1M. We compare benchmarks, costs, and use cases to help you decide. - [GPT-4.1 vs GPT-4o: Pricing, Context Window & When to Upgrade (2026)](https://tokencostcalculators.com/blog/gpt-4-1-vs-gpt-4o-cost/): GPT-4.1 costs 20% less than GPT-4o and has an 8x larger context window. We compare pricing, performance scores, and real monthly costs to help you decide when to switch. - [Best LLM API for Coding in 2026: By Use Case, Budget & Team Size](https://tokencostcalculators.com/blog/best-llm-api-for-coding-2026/): Claude Opus 4.7 for agents, Sonnet 4.6 for code review, Codestral for autocomplete, GPT-4.1 Mini for bulk tasks — a practical guide to picking the right model for coding. - [Claude Sonnet 4.6 vs Opus 4.7: Is the 1.67x Cost Jump Worth It?](https://tokencostcalculators.com/blog/claude-sonnet-vs-opus-2026/): Opus 4.7 costs 1.67x more than Sonnet 4.6. We break down exactly which workloads justify the premium — and where Sonnet is the smarter default. - [xAI Grok API Pricing 2026: Grok-3, Grok-3 Fast & Grok-3 Mini Compared](https://tokencostcalculators.com/blog/xai-grok-api-pricing-2026/): Grok 3 at $3/1M input competes with Claude Sonnet on price but lacks prompt caching. We compare costs, performance, and whether real-time web access justifies choosing Grok. - [Token Optimization: 7 Techniques to Cut Your LLM API Token Count by 50%](https://tokencostcalculators.com/blog/token-optimization-guide/): Compress system prompts, switch to structured output, truncate context, and control output length. Practical token optimization techniques that reduce API costs by 40–60% without hurting quality. - [LLM Context Window Management: RAG vs Compression vs Full Context](https://tokencostcalculators.com/blog/llm-context-window-management/): When to use RAG, when to summarize conversation history, and when a 1M token context window is actually worth the cost. A practical decision guide for developers. - [LLM Model Routing: How to Save 50–70% by Sending Requests to the Right Model](https://tokencostcalculators.com/blog/llm-model-routing-cost-savings/): Route simple requests to cheap models and complex ones to frontier models. Practical guide to rule-based, LLM-based, and semantic routing — with real cost calculations. ## Optional - [About TokenCost](https://tokencostcalculators.com/about/): Methodology, data sources, and team - [Contact](https://tokencostcalculators.com/contact/): Report pricing errors or request new models - [Privacy Policy](https://tokencostcalculators.com/privacy-policy/): Data and cookie policy