OpenAI API Pricing Calculator
Estimate your monthly OpenAI API cost in 30 seconds. Compare GPT-5, GPT-4o, GPT-4o mini, GPT-4.1, o1, and legacy models side-by-side. Model prompt caching and Batch API discounts. Free, no signup, pricing data refreshed weekly from OpenAI's official pricing page.
Calculate your OpenAI API monthly cost
Compare up to 4 OpenAI models side-by-side. Pricing data last refreshed 2026-05-26 from openai.com/api/pricing.
Easier for non-developers — we estimate tokens from your conversation volume.
Total user conversations expected across all bots and channels.
Counting both user and assistant turns. Typical SMB chatbot: 6-12.
Average words per turn. Short answers: 20-40. Long explanations: 80-150.
Non-English languages use more tokens per word (Russian ~1.6×, Chinese ~2.1×).
Cost optimizations
Monthly cost comparison
High-volume SMB chatbots — best cost-per-quality at scale
General-purpose multimodal — proven workhorse model
Balanced cost and quality for production chatbots
Simple classification, intent recognition, low-stakes copy
Cost-vs-quality recommendation
GPT-4o mini is 94% cheaper than GPT-4o while staying in the balanced quality tier. Best for: High-volume SMB chatbots — best cost-per-quality at scale.
Spread between cheapest and most expensive selected model: 94%.
Full per-model breakdown
| Model | Monthly input | Monthly output | Monthly total | Annual | Per conv |
|---|---|---|---|---|---|
GPT-4o mini in $0.0002/1k · out $0.0006/1k | $0.1350 | $0.5400 | $0.6750 | $8.10 | $0.0001 |
GPT-4o in $0.0025/1k · out $0.0100/1k | $2.25 | $9.00 | $11.25 | $135.00 | $0.0011 |
GPT-5 mini in $0.0010/1k · out $0.0040/1k | $0.9000 | $3.60 | $4.50 | $54.00 | $0.0005 |
GPT-3.5 Turbo in $0.0005/1k · out $0.0015/1k | $0.4500 | $1.35 | $1.80 | $21.60 | $0.0002 |
Volume basis: 900.0K input tokens + 900.0K output tokens per month across 10,000 conversations.
Save more — applicable optimizations
- −50%
Use Batch API for non-realtime workloads
Est. saves $9.11/month · medium difficulty · Learn how
- −25%
Enable prompt caching for repeat system instructions
Est. saves $4.56/month · easy difficulty · Learn how
- −20%
Cap output token length via max_tokens parameter
Est. saves $3.65/month · easy difficulty · Learn how
Embed this calculator on your site (free)
<iframe
src="https://chatbotscape.com/embed/tools/openai-api-pricing-calculator/"
width="100%" height="900" frameborder="0"
title="OpenAI API Pricing Calculator by Chatbotscape"
loading="lazy">
</iframe>How OpenAI's pricing actually works
OpenAI charges per token — not per request, not per message, not per «conversation». A token is roughly three-quarters of an English word (the exact ratio varies by language — Russian and Chinese consume substantially more tokens per word, see the language multiplier in the calculator's conversation mode). When you call the API, you pay for two things:
- Input tokens— everything you send to the model: your system prompt, the user's message, any prior conversation context you replay for memory, any retrieved documents from RAG.
- Output tokens — the model's response. Output is typically 3-4× more expensive per token than input across most OpenAI models. This asymmetry matters for cost optimization: long input contexts are cheaper than long outputs.
For a typical SMB customer support chatbot answering 10,000 conversations per month with 8 messages per conversation and 40 words per message, the calculator estimates roughly 480,000 input tokens and 480,000 output tokens per day — or about 14.4M input + 14.4M output tokens per month. At GPT-4o mini pricing, that's $2.16 input + $8.64 output = $10.80 per month. At GPT-4o pricing, the same workload costs $180 per month — a 17× cost increase for the upgrade.
When GPT-4o vs GPT-4o mini vs GPT-3.5 Turbo
The biggest cost-optimization lever in OpenAI pricing is model selection. Use the calculator above to compare exact numbers for your workload, but here's the rule-of-thumb decision framework:
- GPT-4o mini — default for production SMB chatbots. Handles intent classification, FAQ retrieval, lightweight reasoning, and most customer support scenarios at 6% of the cost of GPT-4o. Use this unless you have evidence GPT-4o is measurably better for your specific use case.
- GPT-4o— reach for it when conversation quality is the unique differentiator (premium B2B sales bots, healthcare triage, legal Q&A). The quality improvement over GPT-4o mini is real but workload-dependent; benchmark before committing.
- GPT-5 / GPT-5 mini— currently OpenAI's flagship for complex reasoning, agentic workflows with multiple tool calls, and high-quality content generation. GPT-5 mini is the new «balanced tier» sweet spot.
- GPT-3.5 Turbo— legacy. Useful only for cheap classification tasks where you specifically don't want any reasoning capability. Consider migrating to GPT-4o mini, which is comparable in cost and substantially better in quality.
- o1-preview / o1-mini— reasoning models that «think» before responding. Significantly more expensive per output token because they generate internal reasoning chains. Use only when the reasoning matters (math, complex code review, multi-step planning) — not for chat.
Top 5 cost optimization tactics
1. Enable prompt caching
If your prompts include a long static system instruction (e.g., your bot's persona, brand voice rules, escalation criteria), enable prompt caching. Cached input tokens cost about 10% of full price. For a typical bot with 300-token system prompt and 10,000 calls per month, this saves 20-30% on input cost. Implementation difficulty: easy — just add cache_control markers to your prompt structure.
2. Use Batch API for non-realtime workloads
OpenAI's Batch API gives you 50% off both input and output tokens in exchange for a 24-hour SLA. Perfect for: bulk classification, weekly embedding refreshes, large RAG knowledge base indexing, content summarization at scale. Not suitable for live chatbot conversations.
3. Cap output token length
Set max_tokensaggressively. Most chatbot responses don't need to be more than 200 tokens. Setting a hard cap prevents runaway responses that drive up cost unpredictably. Industry pattern: 200-token cap for support bots, 500-token cap for informational bots, 1500-token cap for long-form content generation.
4. Use smaller models for classification, larger for generation
Architect your bot as a two-stage pipeline: GPT-4o mini classifies the user's intent and routes to a handler; only complex generation handlers escalate to GPT-4o. This routing pattern can cut your overall LLM bill 60-80% while preserving quality for the generation-critical paths.
5. Tier-down embeddings unless quality matters
Use text-embedding-3-small ($0.02 per 1M tokens) instead of text-embedding-3-large ($0.13 per 1M tokens) for typical RAG knowledge bases. The quality difference is real but small — and most production RAG systems are bottlenecked by chunking strategy and retrieval logic, not embedding quality.
OpenAI vs Anthropic vs Gemini — cost landscape
OpenAI is not always the cheapest choice. As of 2026, Anthropic's Claude family (Haiku, Sonnet, Opus) competes directly on the cost-vs-quality frontier. Google's Gemini Flash family is aggressively priced for high-volume workloads. For multi-provider comparison, see our LLM API Cost Calculator which extends this OpenAI-specific calculator to compare across providers.
Most production chatbots today use a multi-model architecture: cheap fast models for classification + intent recognition, premium models for nuanced generation, fallback models for cost-overage protection. The cost calculator above models OpenAI-only deployments; the cross-provider calculator models the multi-vendor architecture.
Related Chatbotscape resources
- Large language model — definition + concepts
- System prompt — what it is + why size matters
- Chatbase review — OpenAI-powered chatbot platform
- Botpress review — multi-LLM chatbot platform
- Voiceflow review — visual designer + OpenAI
- WhatsApp Business API cost calculator
- How we source and refresh pricing data
FAQ
Which OpenAI model is cheapest for chatbots?
GPT-4o mini at $0.15 input + $0.60 output per 1M tokens. Run the calculator above with your specific volume to see exact monthly numbers; the cheapest model can vary by use case when you weight quality.
How accurate is conversation-mode token estimation?
The calculator uses OpenAI's documented heuristic of ~0.75 tokens per English word, with language multipliers for non-English. For production accuracy, switch to token mode with direct telemetry, or integrate js-tiktoken in your application for exact counting.
What is prompt caching?
OpenAI's feature that lets you cache repeated context (typically system prompts) for 5 minutes between calls. Cached portion costs ~10% of full input price. Toggle it in the calculator above to see the effect on your monthly cost.
When should I use Batch API?
For workloads where 24-hour latency is acceptable: bulk classification, embedding generation, content summarization at scale. Never for realtime chatbots. The 50% discount on both input and output makes it the single largest cost lever for batch-friendly workloads.
How often is pricing refreshed?
Weekly, from OpenAI's official pricing page. Current data last refreshed 2026-05-26.
Can I embed this calculator on my site?
Yes — free. Copy the iframe snippet from the embed section above. Embed strips Chatbotscape navigation, keeps the calculator + attribution badge.
Does it include GPT-5 and o1 models?
Yes — full GPT-5 family, o1-preview, o1-mini, plus all GPT-4 family variants and both text-embedding-3 models.
How do I convert monthly conversations to API tokens?
The calculator does this for you in conversation mode using the formula: conversations × messages per conversation × words per message × (0.75 tokens-per-word × language multiplier). Half is treated as input, half as output.