Compare Models
Select up to 4 models to compare benchmarks, pricing, and capabilities side by side.
OpenAI
Anthropic
DeepSeek
Add Model
MMLU
o3-mini
86.9
Claude Sonnet 4
88.7
DeepSeek-R1-Distill-Qwen-32B
86.0
HumanEval
o3-mini
92.9
Claude Sonnet 4
93.7
DeepSeek-R1-Distill-Qwen-32B
85.0
GSM8K
o3-mini
97.9
Claude Sonnet 4
96.4
DeepSeek-R1-Distill-Qwen-32B
96.0
GPQA
o3-mini
77.0
Claude Sonnet 4
68.2
DeepSeek-R1-Distill-Qwen-32B
62.0
MGSM
o3-mini
89.5
Claude Sonnet 4
91.6
DeepSeek-R1-Distill-Qwen-32B
0.0
ARC-Challenge
o3-mini
96.0
Claude Sonnet 4
96.7
DeepSeek-R1-Distill-Qwen-32B
0.0
HellaSwag
o3-mini
92.5
Claude Sonnet 4
93.2
DeepSeek-R1-Distill-Qwen-32B
0.0
MATH
o3-mini
97.0
Claude Sonnet 4
78.0
DeepSeek-R1-Distill-Qwen-32B
94.0
SWE-bench
o3-mini
49.3
Claude Sonnet 4
53.6
DeepSeek-R1-Distill-Qwen-32B
0.0
MMMLU
o3-mini
83.5
Claude Sonnet 4
86.0
DeepSeek-R1-Distill-Qwen-32B
0.0
AIME 2025
o3-mini
0.0
Claude Sonnet 4
0.0
DeepSeek-R1-Distill-Qwen-32B
72.0
| Model | Input | Output | Blended* |
|---|---|---|---|
o3-mini | $1.10 | $4.40 | $2.75 |
Claude Sonnet 4 | $3.00 | $15.00 | $9.00 |
DeepSeek-R1-Distill-Qwen-32B | $0.12 | $0.18 | $0.15 |
*Blended = average of input and output price
| Spec | o3-mini | Claude Sonnet 4 | DeepSeek-R1-Distill-Qwen-32B |
|---|---|---|---|
| Context Window | 200K | 200K | 128K |
| Max Output | 100K | 16K | 8K |
| TTFT | 800ms | 280ms | 300ms |
| Speed | 75 tok/s | 100 tok/s | 100 tok/s |
| Parameters | N/A | N/A | 32B |
| Architecture | Transformer + CoT | Transformer | Transformer + CoT (distilled) |
| Open Source | No | No | Yes |
| Tier | mid | mid | mid |
Quick Verdict
Best Performance
o3-mini
Best Value
DeepSeek-R1-Distill-Qwen-32B
Fastest
Claude Sonnet 4