tutorialtokenspricing

LLM Tokens Explained: What They Are and How They Affect Your API Bill

What is a token, how many tokens is your content, and exactly how does token count translate to API cost? Everything developers need to know.

TTokenCost Editorial·LLM Cost Research·Updated 2026-05-025 min read

Every LLM API bill is denominated in tokens — but what exactly is a token, how many tokens does your content use, and how does token count translate to real dollars? This guide explains everything you need to know to understand and control your LLM API costs.

What Is a Token?

A token is a chunk of text — roughly 3–4 characters in English, or about 0.75 words. Tokenization is how LLMs break text into pieces their neural networks can process. The exact split depends on the model's tokenizer:

Token Examples
"Hello, world!"4 tokensHello / , / world / !
"The quick brown fox"4 tokensThe / quick / brown / fox
"API pricing calculator"4 tokensAPI / pricing / cal / culator
"Anthropic"3 tokensAnthrop / ic → 2 tokens (GPT) vs 1 token (Claude)
Code: `function() {}`6–8 tokensCode tokenizes differently — brackets and symbols each count

Input Tokens vs Output Tokens

Every API call has two token counts that are priced separately:

Input Tokens
Includes: Your system prompt + conversation history + user message + tool definitions
Pricing: Cheaper — typically $0.10–$2.00 per 1M tokens
💡 Caching can reduce effective cost by 75–90%
Output Tokens
Includes: The model's response — everything it generates
Pricing: More expensive — typically $0.40–$15 per 1M tokens (3–5x input price)
💡 Set max_tokens and instruct concise responses

Tokens to Words to Cost: The Math

Rule of thumb: 1 token ≈ 0.75 words, so 1,000 words ≈ 1,333 tokens. Here's how common content types translate:

Content Type~Words~Tokens
Short chatbot reply50–10067–133
Email or short blog post500~667
Standard system prompt200–400267–533
Long-form article2,000~2,667
Full book chapter5,000~6,667
Typical code file500 LOC3,000–8,000

Token Cost Examples Across Models

Cost to process 1 full-length article (1,000 input tokens) and generate a 500-word summary (667 output tokens):

GPT-4o$0.00917/article
Claude Haiku 4.5$0.00434/article
Gemini 2.5 Flash$0.00197/article

Reasoning Tokens: The Hidden Cost

Reasoning models (OpenAI o3, o4-mini; Claude with extended thinking; DeepSeek R1) use additional “thinking” tokens before generating their final response. These reasoning tokens are billed at the standard output rate — and can be 3–10x the length of the visible response.

Example: o4-mini solving a math problem. You see a 200-token answer. But the model generated 2,000 reasoning tokens internally — all billed at $4.40/1M output. Your actual cost per request is ~10x what the visible response suggests.

Context Window: The Token Limit

Every model has a context window — the maximum number of tokens it can process in a single API call (input + output combined). Exceeding this limit causes an error; approaching it can degrade quality.

4K tokens~3,000 words — one long email threadLegacy GPT-3.5 limit
128K tokens~96,000 words — a short novelClaude 3.5, GPT-4o
200K tokens~150,000 words — The Lord of the Ringso3, o4-mini
1M tokens~750,000 words — entire codebaseClaude 4, Gemini 2.5, GPT-4.1

Use the token counter to measure your prompts: Token counter → | Cost calculator →

Related Articles

Cheapest LLM API in 2026: Full Price Comparison
We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.
8 min read
Prompt Caching: Save Up to 90% on LLM API Costs
Everything you need to know about prompt caching across Anthropic, OpenAI, and Google — how it works, when to use it, and how much you save.
5 min read
DeepSeek API Pricing Guide 2026: R1 vs Chat
How DeepSeek R1 and Chat pricing compares to GPT-4o and Claude Sonnet — and when it makes sense to switch for your workload.
5 min read