# TokenCost — LLM API Pricing Calculator

> Real-time LLM API pricing data for 53 models across 8 providers. Compare token costs, context windows, and performance scores. Free calculator for developers.

TokenCost tracks exact API pricing (input/1M tokens, output/1M tokens, cached/1M tokens) for every major LLM provider. Data is updated weekly. All prices are in USD.

## Tools

- [Token Cost Calculator](https://tokencostcalculators.com/): Interactive calculator — enter token counts, get exact monthly cost for any model
- [Model Comparison Tool](https://tokencostcalculators.com/compare/): Side-by-side comparison of any two LLM models on price and performance
- [Token Counter](https://tokencostcalculators.com/token-counter/): Count tokens for any text using GPT-4 / Claude tokenizers
- [Cost Routing Calculator](https://tokencostcalculators.com/cost-routing/): Model what you save by routing requests to cheaper models
- [Image Pricing Calculator](https://tokencostcalculators.com/image-pricing/): DALL-E, Stable Diffusion image generation cost calculator
- [All Models Index](https://tokencostcalculators.com/models/): Full list of all tracked LLM models with pricing

## Benchmark Pages

- [Cheapest LLM API 2026](https://tokencostcalculators.com/cheapest-llm-api/): Ranked list of cheapest LLMs by input price, output price, and use case
- [Best Value LLM 2026](https://tokencostcalculators.com/best-value-llm-2026/): Performance-per-dollar ranking across all tracked models
- [100k Token Cost Comparison](https://tokencostcalculators.com/100k-tokens-cost/): What 100,000 tokens costs on every model
- [1M Token Cost Comparison](https://tokencostcalculators.com/1m-tokens-cost/): What 1 million tokens costs on every model

## Provider Calculators

- [OpenAI API Cost Calculator](https://tokencostcalculators.com/openai-api-cost-calculator/): GPT-5, GPT-4.1, GPT-4o, o3, o4-mini pricing
- [Anthropic Claude API Cost Calculator](https://tokencostcalculators.com/anthropic-api-cost-calculator/): Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 pricing
- [Google Gemini API Cost Calculator](https://tokencostcalculators.com/google-gemini-api-cost-calculator/): Gemini 3 Ultra/Pro/Flash, Gemini 2.5 Pro/Flash pricing
- [DeepSeek API Cost Calculator](https://tokencostcalculators.com/deepseek-api-cost-calculator/): DeepSeek R2, R1, Chat pricing
- [Mistral API Cost Calculator](https://tokencostcalculators.com/mistral-api-cost-calculator/): Magistral, Mistral Large, Codestral pricing
- [Meta Llama API Cost Calculator](https://tokencostcalculators.com/meta-llama-api-cost-calculator/): Llama 4 Maverick, Scout, Llama 3.x pricing
- [xAI Grok API Cost Calculator](https://tokencostcalculators.com/xai-grok-api-cost-calculator/): Grok 3, Grok 3 Fast, Grok 3 Mini pricing
- [Qwen API Cost Calculator](https://tokencostcalculators.com/qwen-api-cost-calculator/): Qwen3 235B, Qwen3.5 Flash pricing

## Current LLM API Pricing Data

All prices in USD per 1 million tokens. Updated May 2026.

### OpenAI

- **GPT-5** (gpt-5): input $8/1M, output $32/1M, cached input $4/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-5/)
- **GPT-5 Mini** (gpt-5-mini): input $0.6/1M, output $2.4/1M, cached input $0.3/1M, context 512k — [details](https://tokencostcalculators.com/models/gpt-5-mini/)
- **GPT-4.1** (gpt-4.1): input $2/1M, output $8/1M, cached input $1/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1/)
- **GPT-4.1 Mini** (gpt-4.1-mini): input $0.4/1M, output $1.6/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1-mini/)
- **GPT-4.1 Nano** (gpt-4.1-nano): input $0.1/1M, output $0.4/1M, context 1M — [details](https://tokencostcalculators.com/models/gpt-4.1-nano/)
- **o4-mini** (o4-mini): input $1.1/1M, output $4.4/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o4-mini/)
- **o3** (o3): input $0.4/1M, output $1.6/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o3/)
- **o1** (o1): input $15/1M, output $60/1M, cached input $7.5/1M, context 200k [reasoning model] — [details](https://tokencostcalculators.com/models/o1/)
- **GPT-4o** (gpt-4o): input $2.5/1M, output $10/1M, cached input $1.25/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4o/)
- **GPT-4o Mini** (gpt-4o-mini): input $0.15/1M, output $0.6/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4o-mini/)
- **GPT-4 Turbo** (gpt-4-turbo): input $10/1M, output $30/1M, context 128k — [details](https://tokencostcalculators.com/models/gpt-4-turbo/)
- **GPT-3.5 Turbo** (gpt-3-5-turbo): input $0.5/1M, output $1.5/1M, context 16k — [details](https://tokencostcalculators.com/models/gpt-3-5-turbo/)

### Anthropic

- **Claude Fable 5** (claude-fable-5): input $10/1M, output $50/1M, cached input $1/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-fable-5/)
- **Claude Opus 4.8** (claude-opus-4-8): input $5/1M, output $25/1M, cached input $0.5/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-8/)
- **Claude Opus 4.7** (claude-opus-4-7): input $5/1M, output $25/1M, cached input $0.5/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-7/)
- **Claude Sonnet 4.6** (claude-sonnet-4-6): input $3/1M, output $15/1M, cached input $0.3/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-sonnet-4-6/)
- **Claude Opus 4.6** (claude-opus-4-6): input $5/1M, output $25/1M, context 1M — [details](https://tokencostcalculators.com/models/claude-opus-4-6/)
- **Claude Opus 4.5** (claude-opus-4-5): input $15/1M, output $75/1M, cached input $1.5/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-opus-4-5/)
- **Claude Haiku 4.5** (claude-haiku-4-5-20251001): input $1/1M, output $5/1M, cached input $0.1/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-haiku-4-5-20251001/)
- **Claude 3.5 Sonnet** (claude-3-5-sonnet-20241022): input $3/1M, output $15/1M, cached input $0.3/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-5-sonnet-20241022/)
- **Claude 3.5 Haiku** (claude-3-5-haiku-20241022): input $0.8/1M, output $4/1M, cached input $0.08/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-5-haiku-20241022/)
- **Claude 3 Opus** (claude-3-opus-20240229): input $15/1M, output $75/1M, cached input $1.5/1M, context 200k — [details](https://tokencostcalculators.com/models/claude-3-opus-20240229/)

### Google

- **Gemini 3 Ultra** (gemini-3-ultra): input $10/1M, output $30/1M, cached input $2.5/1M, context 2M — [details](https://tokencostcalculators.com/models/gemini-3-ultra/)
- **Gemini 3 Pro** (gemini-3-pro): input $3.5/1M, output $14/1M, cached input $0.875/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-pro/)
- **Gemini 3 Flash** (gemini-3-flash): input $0.5/1M, output $2/1M, cached input $0.125/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-flash/)
- **Gemini 3 Flash-Lite** (gemini-3-flash-lite): input $0.12/1M, output $0.48/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-3-flash-lite/)
- **Gemini 2.5 Pro** (gemini-2.5-pro): input $1.25/1M, output $10/1M, cached input $0.31/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-pro/)
- **Gemini 2.5 Flash** (gemini-2.5-flash): input $0.3/1M, output $2.5/1M, cached input $0.075/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-flash/)
- **Gemini 2.5 Flash-Lite** (gemini-2.5-flash-lite): input $0.1/1M, output $0.4/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.5-flash-lite/)
- **Gemini 2.0 Flash** (gemini-2.0-flash): input $0.1/1M, output $0.4/1M, cached input $0.025/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.0-flash/)
- **Gemini 2.0 Flash-Lite** (gemini-2.0-flash-lite): input $0.075/1M, output $0.3/1M, context 1M — [details](https://tokencostcalculators.com/models/gemini-2.0-flash-lite/)
- **Gemini 1.5 Pro** (gemini-1.5-pro): input $1.25/1M, output $5/1M, cached input $0.31/1M, context 2M — [details](https://tokencostcalculators.com/models/gemini-1.5-pro/)

### Mistral

- **Magistral Medium** (magistral-medium): input $2/1M, output $5/1M, context 128k — [details](https://tokencostcalculators.com/models/magistral-medium/)
- **Mistral Large 3** (mistral-large-3): input $0.5/1M, output $1.5/1M, context 256k — [details](https://tokencostcalculators.com/models/mistral-large-3/)
- **Mistral Medium 3** (mistral-medium-3): input $0.4/1M, output $2/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-medium-3/)
- **Mistral Small 3.1** (mistral-small-3-1): input $0.1/1M, output $0.3/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-small-3-1/)
- **Codestral** (codestral-2501): input $0.3/1M, output $0.9/1M, context 256k — [details](https://tokencostcalculators.com/models/codestral-2501/)
- **Mistral Nemo** (mistral-nemo): input $0.1/1M, output $0.3/1M, context 128k — [details](https://tokencostcalculators.com/models/mistral-nemo/)

### DeepSeek

- **DeepSeek R2** (deepseek-r2): input $0.8/1M, output $3.2/1M, cached input $0.2/1M, context 128k [reasoning model] — [details](https://tokencostcalculators.com/models/deepseek-r2/)
- **DeepSeek R1** (deepseek-r1): input $0.55/1M, output $2.19/1M, cached input $0.14/1M, context 64k [reasoning model] — [details](https://tokencostcalculators.com/models/deepseek-r1/)
- **DeepSeek Chat** (deepseek-chat): input $0.27/1M, output $1.1/1M, cached input $0.07/1M, context 64k — [details](https://tokencostcalculators.com/models/deepseek-chat/)

### Meta

- **Llama 4 Maverick** (llama-4-maverick): input $0.5/1M, output $1.1/1M, context 1M — [details](https://tokencostcalculators.com/models/llama-4-maverick/)
- **Llama 4 Scout** (llama-4-scout): input $0.17/1M, output $0.17/1M, context 10M — [details](https://tokencostcalculators.com/models/llama-4-scout/)
- **Llama 3.1 405B** (llama-3-1-405b): input $3.5/1M, output $3.5/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-1-405b/)
- **Llama 3.3 70B** (llama-3-3-70b): input $0.23/1M, output $0.4/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-3-70b/)
- **Llama 3.1 8B** (llama-3-1-8b): input $0.02/1M, output $0.05/1M, context 128k — [details](https://tokencostcalculators.com/models/llama-3-1-8b/)

### xAI

- **Grok 3** (grok-3): input $3/1M, output $15/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3/)
- **Grok 3 Fast** (grok-3-fast): input $5/1M, output $25/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3-fast/)
- **Grok 3 Mini** (grok-3-mini): input $0.3/1M, output $0.5/1M, context 128k — [details](https://tokencostcalculators.com/models/grok-3-mini/)

### Qwen (Alibaba)

- **Qwen3.5 Flash** (qwen3-5-flash): input $0.01/1M, output $0.05/1M, context 256k — [details](https://tokencostcalculators.com/models/qwen3-5-flash/)
- **Qwen3 235B** (qwen3-235b): input $0.06/1M, output $0.06/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-235b/)
- **Qwen3 30B** (qwen3-30b): input $0.1/1M, output $0.15/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-30b/)
- **Qwen3 8B** (qwen3-8b): input $0.05/1M, output $0.1/1M, context 32k — [details](https://tokencostcalculators.com/models/qwen3-8b/)

## Blog — LLM Cost Guides

- [Cheapest LLM API in 2026: Full Price Comparison](https://tokencostcalculators.com/blog/cheapest-llm-api-2026/): We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.
- [7 Ways to Reduce Your OpenAI API Cost by 80%](https://tokencostcalculators.com/blog/reduce-openai-api-cost/): Practical techniques to dramatically cut your OpenAI API bill: prompt caching, model routing, batch API, and token optimization strategies.
- [GPT vs Claude vs Gemini: Pricing & Performance in 2026](https://tokencostcalculators.com/blog/gpt-vs-claude-vs-gemini-pricing-2026/): A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads.
- [Prompt Caching: Save Up to 90% on LLM API Costs](https://tokencostcalculators.com/blog/prompt-caching-guide/): Everything you need to know about prompt caching across Anthropic, OpenAI, and Google — how it works, when to use it, and how much you save.
- [DeepSeek API Pricing Guide 2026: R1 vs Chat](https://tokencostcalculators.com/blog/deepseek-api-cost-guide/): How DeepSeek R1 and Chat pricing compares to GPT-4o and Claude Sonnet — and when it makes sense to switch for your workload.
- [Mistral API Pricing Guide 2026: Magistral, Large & Codestral Compared](https://tokencostcalculators.com/blog/mistral-api-pricing-guide/): Complete pricing breakdown for all Mistral AI models — Magistral reasoning, Codestral for code, Mistral Large vs GPT-4o, and EU data residency options.
- [Llama 4 API Cost Guide: Maverick vs Scout vs Self-Hosting](https://tokencostcalculators.com/blog/llama-4-api-cost-guide/): Meta Llama 4 pricing explained — Maverick vs Scout, hosted API vs self-hosting economics, and when Llama 3.1 8B is still the cheapest capable option.
- [Gemini 2.5 Pro vs GPT-4o: Pricing & Performance in 2026](https://tokencostcalculators.com/blog/gemini-2-5-pro-vs-gpt-4o-cost/): Detailed cost comparison of Google Gemini 2.5 Pro vs OpenAI GPT-4o — monthly pricing at scale, where each model wins, and when to use Gemini 2.5 Flash instead.
- [Best LLM API for Chatbots in 2026: Cost vs Quality Breakdown](https://tokencostcalculators.com/blog/best-llm-api-for-chatbot-2026/): Which LLM API should you use for your chatbot? We compare cost, quality, and context window for customer support, RAG, and high-volume use cases.
- [Claude API Pricing 2026: Every Model, Every Tier Explained](https://tokencostcalculators.com/blog/claude-api-pricing-2026/): Complete guide to Anthropic Claude API pricing — Opus, Sonnet, and Haiku tiers, prompt caching discounts, and how Claude compares to GPT-4o at scale.
- [GPT-4o Mini vs Claude Haiku 4.5: Cost & Quality Comparison 2026](https://tokencostcalculators.com/blog/gpt-4o-mini-vs-claude-haiku-cost/): Head-to-head comparison of the two most popular small LLM APIs — pricing, performance, caching advantages, and which to choose for your use case.
- [8 Proven Ways to Reduce LLM API Costs by 60–90%](https://tokencostcalculators.com/blog/how-to-reduce-llm-api-costs/): Practical techniques to dramatically cut your LLM API bill: model routing, prompt caching, batch API, output control, and provider switching strategies.
- [OpenAI o3 vs DeepSeek R1: Reasoning Model Cost Comparison 2026](https://tokencostcalculators.com/blog/o3-vs-deepseek-r1-cost/): How much cheaper is DeepSeek R1 than o3? Benchmark scores, monthly cost at scale, and which reasoning model to choose for your workload.
- [LLM Tokens Explained: What They Are and How They Affect Your API Bill](https://tokencostcalculators.com/blog/llm-tokens-explained/): What is a token, how many tokens is your content, and exactly how does token count translate to API cost? Everything developers need to know.
- [OpenAI Batch API: How to Cut Costs by 50% on Bulk Requests](https://tokencostcalculators.com/blog/openai-batch-api-cost-savings/): A practical guide to OpenAI's Batch API — how it works, which models support it, real savings calculations, and how to combine it with prompt caching for maximum cost reduction.
- [Gemini 3 API Pricing: Ultra, Pro, Flash & Flash-Lite Compared (2026)](https://tokencostcalculators.com/blog/gemini-3-api-pricing-2026/): Google's Gemini 3 series is here — Ultra, Pro, Flash, and Flash-Lite. Full pricing breakdown, how each model compares to Gemini 2.5, and which to use for your workload.
- [GPT-5 API Pricing: Is It Worth 4x the Cost of GPT-4.1?](https://tokencostcalculators.com/blog/gpt-5-api-pricing-2026/): OpenAI's GPT-5 is out at $8/1M input — 4x more than GPT-4.1. We break down when the upgrade is worth it, how GPT-5 Mini competes, and what this means for your monthly bill.
- [DeepSeek R2 vs R1: What Changed, and Is It Worth Switching?](https://tokencostcalculators.com/blog/deepseek-r2-vs-r1-cost/): DeepSeek R2 is faster, smarter, and has 2x the context window of R1 — at $0.80/1M vs $0.55/1M. We compare benchmarks, costs, and use cases to help you decide.
- [GPT-4.1 vs GPT-4o: Pricing, Context Window & When to Upgrade (2026)](https://tokencostcalculators.com/blog/gpt-4-1-vs-gpt-4o-cost/): GPT-4.1 costs 20% less than GPT-4o and has an 8x larger context window. We compare pricing, performance scores, and real monthly costs to help you decide when to switch.
- [Best LLM API for Coding in 2026: By Use Case, Budget & Team Size](https://tokencostcalculators.com/blog/best-llm-api-for-coding-2026/): Claude Opus 4.7 for agents, Sonnet 4.6 for code review, Codestral for autocomplete, GPT-4.1 Mini for bulk tasks — a practical guide to picking the right model for coding.
- [Claude Sonnet 4.6 vs Opus 4.7: Is the 1.67x Cost Jump Worth It?](https://tokencostcalculators.com/blog/claude-sonnet-vs-opus-2026/): Opus 4.7 costs 1.67x more than Sonnet 4.6. We break down exactly which workloads justify the premium — and where Sonnet is the smarter default.
- [xAI Grok API Pricing 2026: Grok-3, Grok-3 Fast & Grok-3 Mini Compared](https://tokencostcalculators.com/blog/xai-grok-api-pricing-2026/): Grok 3 at $3/1M input competes with Claude Sonnet on price but lacks prompt caching. We compare costs, performance, and whether real-time web access justifies choosing Grok.
- [Token Optimization: 7 Techniques to Cut Your LLM API Token Count by 50%](https://tokencostcalculators.com/blog/token-optimization-guide/): Compress system prompts, switch to structured output, truncate context, and control output length. Practical token optimization techniques that reduce API costs by 40–60% without hurting quality.
- [LLM Context Window Management: RAG vs Compression vs Full Context](https://tokencostcalculators.com/blog/llm-context-window-management/): When to use RAG, when to summarize conversation history, and when a 1M token context window is actually worth the cost. A practical decision guide for developers.
- [LLM Model Routing: How to Save 50–70% by Sending Requests to the Right Model](https://tokencostcalculators.com/blog/llm-model-routing-cost-savings/): Route simple requests to cheap models and complex ones to frontier models. Practical guide to rule-based, LLM-based, and semantic routing — with real cost calculations.

## Optional

- [About TokenCost](https://tokencostcalculators.com/about/): Methodology, data sources, and team
- [Contact](https://tokencostcalculators.com/contact/): Report pricing errors or request new models
- [Privacy Policy](https://tokencostcalculators.com/privacy-policy/): Data and cookie policy