openaibatch apicost optimization

OpenAI Batch API: How to Cut Costs by 50% on Bulk Requests

A practical guide to OpenAI's Batch API — how it works, which models support it, real savings calculations, and how to combine it with prompt caching for maximum cost reduction.

TTokenCost Editorial·LLM Cost Research·Updated 2026-04-274 min read

OpenAI's Batch API lets you submit large sets of requests asynchronously and receive a 50% discount on standard API pricing. If your workload doesn't require real-time responses — data processing, content generation, classification, embeddings — the Batch API is one of the simplest ways to cut your OpenAI bill in half.

Batch API Pricing: Regular vs Batch

Model	Regular Input /1M	Batch Input /1M	Regular Output /1M	Batch Output /1M
GPT-4o	$2.5	$1.250	$10	$5.000
GPT-4.1	$2	$1.000	$8	$4.000
GPT-4o Mini	$0.15	$0.075	$0.6	$0.300

Real-World Savings Example

Say you're processing 100,000 documents per day — summarizing each with GPT-4o using ~2,000 input tokens and ~500 output tokens:

Regular API (GPT-4o)

$1000/day

$30000/month

Batch API (GPT-4o)

$500/day

$15000/month

How the Batch API Works

1. Prepare your requests

Format requests as a JSONL file — one JSON object per line, each with a custom ID, model, and messages array.

2. Upload and submit

Upload the file via the Files API, then create a batch job with your file ID. OpenAI queues the job.

3. Wait for completion

Batch jobs complete within 24 hours. For most jobs, turnaround is 1–4 hours depending on load.

4. Download results

Poll the batch status endpoint. When complete, download the output file — same JSONL format with responses keyed to your custom IDs.

Batch API vs Prompt Caching: Which Saves More?

Both techniques cut costs, but they address different scenarios:

Batch API

Saves: 50% on all tokens

Best for: Async, offline processing — document analysis, content pipelines, data enrichment

Limit: 24-hour max turnaround — no real-time use

Prompt Caching

Saves: 75–90% on repeated input tokens

Best for: Repeated system prompts, RAG context, few-shot examples

Limit: Only saves on previously cached tokens

Combining Both for Maximum Savings

The biggest cost reductions come from combining techniques. A batch job with a large, cached system prompt can achieve 60–80% total cost reduction. Example: batch summarization with a 5,000-token system prompt:

No optimization

Baseline API cost

Batch API only

50% off all tokens

50%

Prompt caching only

90% off 5K cached tokens per request

~60%

Batch API + Prompt Caching

Both discounts stack

~80%

When NOT to Use Batch API

Real-time features: Chatbots, live search, user-facing generation — anything that needs immediate response.
Streaming required: Batch API doesn't support streaming responses.
Time-sensitive pipelines: If downstream tasks depend on quick turnaround (under 1 hour), batch isn't reliable enough.

Calculate your batch savings: Open cost calculator → | Batch processing cost guide →

Cheapest LLM API in 2026: Full Price Comparison

We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.

8 min read

7 Ways to Reduce Your OpenAI API Cost by 80%

Practical techniques to dramatically cut your OpenAI API bill: prompt caching, model routing, batch API, and token optimization strategies.

6 min read

GPT vs Claude vs Gemini: Pricing & Performance in 2026

A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads.

7 min read