by Amazon· 2 months ago
Speech-to-speech model for natural real-time conversations. Supports 7 languages.
Input Price
$0.50/M tokens
Output Price
$0.50/M tokens
Performance Profile
Reliable audio processing with strong multi-language support
Process audio in real-time with support for dozens of languages
vs similar-tier models
| Model | Input | Output | Context | Avg Score |
|---|---|---|---|---|
Amazon Nova 2 SonicCurrent Amazon | $0.50 | $0.50 | N/A | 0.0 |
o3-mini OpenAI | $1.10 | $4.40 | 200K | 86.3 |
DeepSeek-R1 DeepSeek | $0.55 | $2.19 | 128K | 87.0 |
Transcribe a 1-minute clip
<$0.001Short voice memo → text
1,500 in · 200 out
Transcribe a 30-min meeting
$0.025Full meeting → transcript with speakers
45,000 in · 6,000 out
Process 1 hour of audio
$0.051Podcast episode → transcript + summary
90,000 in · 12,000 out
Transcribe 8 hours (full day)
$0.408Call center daily volume
720,000 in · 96,000 out
Voice memos
$26/mo
$0.85/day
Meeting transcripts
$765/mo
$26/day
Podcast processing
$1530/mo
$51/day
No ratings yet. Be the first to rate this model!
Sign in to rate this model and share your experience.
Sign in to leave a comment and join the discussion.
Amazon
Most intelligent Amazon model for complex multi-step reasoning and agentic workflows.
Input
$4.00/M
Output
$12.00/M
Context
1.0M
Amazon
Fast, cost-effective reasoning model with built-in code interpreter and web grounding.
Input
$0.80/M
Output
$2.40/M
Context
1.0M
Amazon
Image generation model with fine-grained control over composition, style, and content.
Input
$0.04/M
Output
$0.04/M
OpenAI
OpenAI's efficient reasoning model, optimized for speed while maintaining strong analytical capabilities.
Input
$1.10/M
Output
$4.40/M
Context
200K
DeepSeek
DeepSeek's reasoning model with transparent chain-of-thought. Open-source and highly competitive.
Input
$0.55/M
Output
$2.19/M
Context
128K
Anthropic
Anthropic's best balance of intelligence and speed. Excellent for production workloads.
Input
$3.00/M
Output
$15.00/M
Context
200K