openaibatch apicost optimization

OpenAI Batch API: How to Cut Costs by 50% on Bulk Requests

A practical guide to OpenAI's Batch API — how it works, which models support it, real savings calculations, and how to combine it with prompt caching for maximum cost reduction.

TTokenCost Editorial·LLM Cost Research·Updated 2026-04-274 min read

OpenAI's Batch API lets you submit large sets of requests asynchronously and receive a 50% discount on standard API pricing. If your workload doesn't require real-time responses — data processing, content generation, classification, embeddings — the Batch API is one of the simplest ways to cut your OpenAI bill in half.

Batch API Pricing: Regular vs Batch

ModelRegular Input /1MBatch Input /1MRegular Output /1MBatch Output /1M
GPT-4o$2.5$1.250$10$5.000
GPT-4.1$2$1.000$8$4.000
GPT-4o Mini$0.15$0.075$0.6$0.300

Real-World Savings Example

Say you're processing 100,000 documents per day — summarizing each with GPT-4o using ~2,000 input tokens and ~500 output tokens:

Regular API (GPT-4o)
$1000/day
$30000/month
Batch API (GPT-4o)
$500/day
$15000/month

How the Batch API Works

1. Prepare your requests
Format requests as a JSONL file — one JSON object per line, each with a custom ID, model, and messages array.
2. Upload and submit
Upload the file via the Files API, then create a batch job with your file ID. OpenAI queues the job.
3. Wait for completion
Batch jobs complete within 24 hours. For most jobs, turnaround is 1–4 hours depending on load.
4. Download results
Poll the batch status endpoint. When complete, download the output file — same JSONL format with responses keyed to your custom IDs.

Batch API vs Prompt Caching: Which Saves More?

Both techniques cut costs, but they address different scenarios:

Batch API
Saves: 50% on all tokens
Best for: Async, offline processing — document analysis, content pipelines, data enrichment
Limit: 24-hour max turnaround — no real-time use
Prompt Caching
Saves: 75–90% on repeated input tokens
Best for: Repeated system prompts, RAG context, few-shot examples
Limit: Only saves on previously cached tokens

Combining Both for Maximum Savings

The biggest cost reductions come from combining techniques. A batch job with a large, cached system prompt can achieve 60–80% total cost reduction. Example: batch summarization with a 5,000-token system prompt:

No optimization
Baseline API cost
0%
Batch API only
50% off all tokens
50%
Prompt caching only
90% off 5K cached tokens per request
~60%
Batch API + Prompt Caching
Both discounts stack
~80%

When NOT to Use Batch API

  • Real-time features: Chatbots, live search, user-facing generation — anything that needs immediate response.
  • Streaming required: Batch API doesn't support streaming responses.
  • Time-sensitive pipelines: If downstream tasks depend on quick turnaround (under 1 hour), batch isn't reliable enough.

Calculate your batch savings: Open cost calculator → | Batch processing cost guide →

Related Articles

Cheapest LLM API in 2026: Full Price Comparison
We compared 26 LLM models across 8 providers to find the cheapest API for every use case — from bulk processing to complex reasoning.
8 min read
7 Ways to Reduce Your OpenAI API Cost by 80%
Practical techniques to dramatically cut your OpenAI API bill: prompt caching, model routing, batch API, and token optimization strategies.
6 min read
GPT vs Claude vs Gemini: Pricing & Performance in 2026
A detailed comparison of OpenAI, Anthropic, and Google's pricing models, context windows, and value for different workloads.
7 min read