OpenAI API Pricing Calculator

Estimate your monthly OpenAI API cost in 30 seconds. Compare GPT-5, GPT-4o, GPT-4o mini, GPT-4.1, o1, and legacy models side-by-side. Model prompt caching and Batch API discounts. Free, no signup, pricing data refreshed weekly from OpenAI's official pricing page.

Calculate your OpenAI API monthly cost

Compare up to 4 OpenAI models side-by-side. Pricing data last refreshed 2026-05-26 from openai.com/api/pricing.

Input mode

Easier for non-developers — we estimate tokens from your conversation volume.

Conversations per month

Total user conversations expected across all bots and channels.

Messages per conversation (avg)

Counting both user and assistant turns. Typical SMB chatbot: 6-12.

Words per message (avg)

Average words per turn. Short answers: 20-40. Long explanations: 80-150.

Primary conversation language

Non-English languages use more tokens per word (Russian ~1.6×, Chinese ~2.1×).

Cost optimizations

Enable prompt cachingCached input tokens cost 10% of full price. Applies to repeated system instructions.Use Batch API50% off both input and output. Requires 24-hour async SLA — not for realtime chatbots.

Display currency

Compare models (4/4 selected)

Monthly cost comparison

GPT-4o minibalancedRecommended

$0.6750

High-volume SMB chatbots — best cost-per-quality at scale

GPT-4ofrontier

$11.25

General-purpose multimodal — proven workhorse model

GPT-5 minibalanced

$4.50

Balanced cost and quality for production chatbots

GPT-3.5 Turbobudget

$1.80

Simple classification, intent recognition, low-stakes copy

Cost-vs-quality recommendation

GPT-4o mini is 94% cheaper than GPT-4o while staying in the balanced quality tier. Best for: High-volume SMB chatbots — best cost-per-quality at scale.

Spread between cheapest and most expensive selected model: 94%.

Full per-model breakdown

Model	Monthly input	Monthly output	Monthly total	Annual	Per conv
GPT-4o mini in $0.0002/1k · out $0.0006/1k	$0.1350	$0.5400	$0.6750	$8.10	$0.0001
GPT-4o in $0.0025/1k · out $0.0100/1k	$2.25	$9.00	$11.25	$135.00	$0.0011
GPT-5 mini in $0.0010/1k · out $0.0040/1k	$0.9000	$3.60	$4.50	$54.00	$0.0005
GPT-3.5 Turbo in $0.0005/1k · out $0.0015/1k	$0.4500	$1.35	$1.80	$21.60	$0.0002

Volume basis: 900.0K input tokens + 900.0K output tokens per month across 10,000 conversations.

Save more — applicable optimizations

Use Batch API for non-realtime workloads
Est. saves $9.11/month · medium difficulty · Learn how
−50%
Enable prompt caching for repeat system instructions
Est. saves $4.56/month · easy difficulty · Learn how
−25%
Cap output token length via max_tokens parameter
Est. saves $3.65/month · easy difficulty · Learn how
−20%

Embed this calculator on your site (free)

<iframe
  src="https://chatbotscape.com/embed/tools/openai-api-pricing-calculator/"
  width="100%" height="900" frameborder="0"
  title="OpenAI API Pricing Calculator by Chatbotscape"
  loading="lazy">
</iframe>

How OpenAI's pricing actually works

OpenAI charges per token — not per request, not per message, not per «conversation». A token is roughly three-quarters of an English word (the exact ratio varies by language — Russian and Chinese consume substantially more tokens per word, see the language multiplier in the calculator's conversation mode). When you call the API, you pay for two things:

Input tokens— everything you send to the model: your system prompt, the user's message, any prior conversation context you replay for memory, any retrieved documents from RAG.
Output tokens — the model's response. Output is typically 3-4× more expensive per token than input across most OpenAI models. This asymmetry matters for cost optimization: long input contexts are cheaper than long outputs.

For a typical SMB customer support chatbot answering 10,000 conversations per month with 8 messages per conversation and 40 words per message, the calculator estimates roughly 480,000 input tokens and 480,000 output tokens per day — or about 14.4M input + 14.4M output tokens per month. At GPT-4o mini pricing, that's $2.16 input + $8.64 output = $10.80 per month. At GPT-4o pricing, the same workload costs $180 per month — a 17× cost increase for the upgrade.

When GPT-4o vs GPT-4o mini vs GPT-3.5 Turbo

The biggest cost-optimization lever in OpenAI pricing is model selection. Use the calculator above to compare exact numbers for your workload, but here's the rule-of-thumb decision framework:

GPT-4o mini — default for production SMB chatbots. Handles intent classification, FAQ retrieval, lightweight reasoning, and most customer support scenarios at 6% of the cost of GPT-4o. Use this unless you have evidence GPT-4o is measurably better for your specific use case.
GPT-4o— reach for it when conversation quality is the unique differentiator (premium B2B sales bots, healthcare triage, legal Q&A). The quality improvement over GPT-4o mini is real but workload-dependent; benchmark before committing.
GPT-5 / GPT-5 mini— currently OpenAI's flagship for complex reasoning, agentic workflows with multiple tool calls, and high-quality content generation. GPT-5 mini is the new «balanced tier» sweet spot.
GPT-3.5 Turbo— legacy. Useful only for cheap classification tasks where you specifically don't want any reasoning capability. Consider migrating to GPT-4o mini, which is comparable in cost and substantially better in quality.
o1-preview / o1-mini— reasoning models that «think» before responding. Significantly more expensive per output token because they generate internal reasoning chains. Use only when the reasoning matters (math, complex code review, multi-step planning) — not for chat.

Top 5 cost optimization tactics

1. Enable prompt caching

If your prompts include a long static system instruction (e.g., your bot's persona, brand voice rules, escalation criteria), enable prompt caching. Cached input tokens cost about 10% of full price. For a typical bot with 300-token system prompt and 10,000 calls per month, this saves 20-30% on input cost. Implementation difficulty: easy — just add cache_control markers to your prompt structure.

2. Use Batch API for non-realtime workloads

OpenAI's Batch API gives you 50% off both input and output tokens in exchange for a 24-hour SLA. Perfect for: bulk classification, weekly embedding refreshes, large RAG knowledge base indexing, content summarization at scale. Not suitable for live chatbot conversations.

3. Cap output token length

Set max_tokensaggressively. Most chatbot responses don't need to be more than 200 tokens. Setting a hard cap prevents runaway responses that drive up cost unpredictably. Industry pattern: 200-token cap for support bots, 500-token cap for informational bots, 1500-token cap for long-form content generation.

4. Use smaller models for classification, larger for generation

Architect your bot as a two-stage pipeline: GPT-4o mini classifies the user's intent and routes to a handler; only complex generation handlers escalate to GPT-4o. This routing pattern can cut your overall LLM bill 60-80% while preserving quality for the generation-critical paths.

5. Tier-down embeddings unless quality matters

Use text-embedding-3-small ($0.02 per 1M tokens) instead of text-embedding-3-large ($0.13 per 1M tokens) for typical RAG knowledge bases. The quality difference is real but small — and most production RAG systems are bottlenecked by chunking strategy and retrieval logic, not embedding quality.

OpenAI vs Anthropic vs Gemini — cost landscape

OpenAI is not always the cheapest choice. As of 2026, Anthropic's Claude family (Haiku, Sonnet, Opus) competes directly on the cost-vs-quality frontier. Google's Gemini Flash family is aggressively priced for high-volume workloads. For multi-provider comparison, see our LLM API Cost Calculator which extends this OpenAI-specific calculator to compare across providers.

Most production chatbots today use a multi-model architecture: cheap fast models for classification + intent recognition, premium models for nuanced generation, fallback models for cost-overage protection. The cost calculator above models OpenAI-only deployments; the cross-provider calculator models the multi-vendor architecture.

Related Chatbotscape resources

FAQ

Which OpenAI model is cheapest for chatbots?

GPT-4o mini at $0.15 input + $0.60 output per 1M tokens. Run the calculator above with your specific volume to see exact monthly numbers; the cheapest model can vary by use case when you weight quality.

How accurate is conversation-mode token estimation?

The calculator uses OpenAI's documented heuristic of ~0.75 tokens per English word, with language multipliers for non-English. For production accuracy, switch to token mode with direct telemetry, or integrate js-tiktoken in your application for exact counting.

What is prompt caching?

OpenAI's feature that lets you cache repeated context (typically system prompts) for 5 minutes between calls. Cached portion costs ~10% of full input price. Toggle it in the calculator above to see the effect on your monthly cost.

When should I use Batch API?

For workloads where 24-hour latency is acceptable: bulk classification, embedding generation, content summarization at scale. Never for realtime chatbots. The 50% discount on both input and output makes it the single largest cost lever for batch-friendly workloads.

How often is pricing refreshed?

Weekly, from OpenAI's official pricing page. Current data last refreshed 2026-05-26.

Can I embed this calculator on my site?

Yes — free. Copy the iframe snippet from the embed section above. Embed strips Chatbotscape navigation, keeps the calculator + attribution badge.

Does it include GPT-5 and o1 models?

Yes — full GPT-5 family, o1-preview, o1-mini, plus all GPT-4 family variants and both text-embedding-3 models.

How do I convert monthly conversations to API tokens?

The calculator does this for you in conversation mode using the formula: conversations × messages per conversation × words per message × (0.75 tokens-per-word × language multiplier). Half is treated as input, half as output.