Pricing

API Pricing

The cost structure for using AI models through cloud APIs, typically charged per token processed. Input tokens and output tokens usually have different prices, with output tokens costing more.

API pricing is how cloud AI providers charge for model usage. The standard pricing unit is dollars per million tokens, with separate rates for input tokens (your prompt) and output tokens (the model's response). Output tokens typically cost 2-5x more than input tokens because generating each output token requires a full forward pass through the model, while input tokens can be processed in parallel during the prefill phase.

Pricing varies dramatically across models and providers. As of early 2025, GPT-4o charges $2.50 per million input tokens and $10.00 per million output tokens. Claude 3.5 Sonnet charges $3.00 input and $15.00 output. Smaller models are much cheaper: GPT-4o-mini costs $0.15/$0.60, and Claude 3 Haiku costs $0.25/$1.25. Open-source models hosted through providers like Together AI, Fireworks, or Groq offer competitive rates, often $0.10-$0.80 per million tokens for capable models.

Understanding pricing helps estimate costs for your application. A customer support chatbot handling 10,000 conversations per day, with average inputs of 500 tokens and outputs of 300 tokens, would process about 8 million tokens daily. At GPT-4o rates, that is roughly $20 for input and $24 for output — about $44 per day or $1,320 per month. The same workload on GPT-4o-mini would cost about $2.50 per day. These calculations are critical for production planning.

Beyond per-token pricing, some providers offer additional pricing models. Cached input tokens (repeated prefixes across requests) are often discounted 50-90%. Batch processing APIs offer 50% discounts for non-time-sensitive workloads. Some providers offer provisioned throughput or committed-use discounts. Fine-tuned models sometimes have higher per-token costs to account for the dedicated infrastructure. GPTCrunch's pricing comparison tools help you navigate these options and find the most cost-effective model for your specific usage pattern.

Attention Mechanism

Explore more AI concepts in the glossary

Browse Full Glossary