by Microsoft· 1 years ago
Microsoft's 5.6B compact model unifying text, vision, and speech in a single architecture.
Context Window
128K
Max Output
8K
TTFT
60ms
Speed
200 tok/s
Input Price
$0.02/M tokens
Output Price
$0.02/M tokens
Performance Profile
Budget-friendly at just $0.02/M input tokens
128K token context window — handles lengthy documents with ease
Supports text + image + audio — true multimodal capability
Fully open source — self-host, fine-tune, and customize without restrictions
vs similar-tier models
| Model | Input | Output | Context | Avg Score |
|---|---|---|---|---|
Phi-4-multimodalCurrent Microsoft | $0.02 | $0.02 | 128K | 73.7 |
Claude Haiku 3.5 Anthropic | $0.80 | $4.00 | 200K | 77.0 |
Mistral Small Mistral AI | $0.10 | $0.30 | 32K | 69.8 |
Describe a single image
<$0.001Photo → detailed description
1,000 in · 200 out
Analyze a chart or diagram
<$0.001Visual data → structured insights
2,000 in · 500 out
OCR a 10-page document
<$0.001Scanned pages → structured text
15,000 in · 3,000 out
Batch process 100 images
$0.0024Bulk image analysis pipeline
100,000 in · 20,000 out
Image descriptions
$0.72/mo
$0.02/day
Document OCR
$11/mo
$0.36/day
Batch image analysis
$72/mo
$2/day
No ratings yet. Be the first to rate this model!
Sign in to rate this model and share your experience.
Sign in to leave a comment and join the discussion.
Microsoft
Microsoft's compact open-source model with 128K context. Great for on-device inference.
Input
$0.01/M
Output
$0.01/M
Context
128K
Microsoft
Microsoft's open-source MoE model with 42B total params and only 6.6B active.
Input
$0.06/M
Output
$0.06/M
Context
128K
Microsoft
Microsoft's 14B open-source model with 128K context and strong reasoning capabilities.
Input
$0.04/M
Output
$0.04/M
Context
128K
Anthropic
Anthropic's fastest and most affordable model. Great for high-volume, low-latency tasks.
Input
$0.80/M
Output
$4.00/M
Context
200K
Mistral AI
Mistral's efficient model for everyday tasks. Fast and cost-effective.
Input
$0.10/M
Output
$0.30/M
Context
32K
OpenAI
A fast, affordable variant of GPT-4.1 for high-volume workloads.
Input
$0.40/M
Output
$1.60/M
Context
1.0M