Name: InternVL3 78B
Price: 0.4 USD
Author: Shanghai AI Lab

Why Choose InternVL3 78B

Frontier-tier performance at $0.40/M input tokens

128K token context window — handles lengthy documents with ease

Supports text + image — true multimodal capability

Fully open source — self-host, fine-tune, and customize without restrictions

Strengths & Limitations

Strengths

+Solid benchmark performance
+Excellent math performance
+Excellent knowledge performance
+Large context window for complex tasks

Limitations

No significant limitations identified

Benchmark Results

MMLU84.0

GPQA50.0

HumanEval78.0

HellaSwag86.0

GSM8K88.0

Quick Comparison

vs similar-tier models

Model	Input	Output	Context	Avg Score
InternVL3 78BCurrent Shanghai AI Lab	$0.40	$1.20	128K	77.2
GPT-4o OpenAI	$2.50	$10.00	128K	81.1
Kimi K2.5 Moonshot AI	$0.45	$2.20	256K	92.3

Full Comparison

Pricing Calculator

How pricing works A token is roughly ¾ of a word. A 1,000-word article is about 1,333 tokens. You pay separately for input (what you send) and output (what the model replies).

Describe a single image

<$0.001

Photo → detailed description

1,000 in · 200 out

Analyze a chart or diagram

$0.0014

Visual data → structured insights

2,000 in · 500 out

OCR a 10-page document

$0.0096

Scanned pages → structured text

15,000 in · 3,000 out

Batch process 100 images

$0.064

Bulk image analysis pipeline

100,000 in · 20,000 out

At scale: 1,000 requests/day

Image descriptions

$19/mo

$0.64/day

Document OCR

$288/mo

$10/day

Batch image analysis

$1920/mo

$64/day

Technical Specifications

ProviderShanghai AI Lab

ArchitectureVision-Language Transformer

Parameters78B

Context Window128K tokens

Modalitiestext, image

Open SourceYes

Release DateJuly 1, 2025

Community Ratings

No ratings yet. Be the first to rate this model!

Rate This Model

Sign in to rate this model and share your experience.

Comments

0 comments

Sign in to leave a comment and join the discussion.

No comments yet. Be the first to share your thoughts!

More from Shanghai AI Lab

InternLM2.5 20B

Shanghai AI Lab

mid

Open-source model with 1M context from Shanghai AI Lab. Strong coding and math skills.

textcode

Input

$0.06/M

Output

$0.06/M

Context

1.0M

InternVL2 26B

Shanghai AI Lab

mid

Open-source vision-language model with strong image understanding capabilities.

textimage

Input

$0.08/M

Output

$0.08/M

Context

8K

InternLM3 8B

Shanghai AI Lab

budget

Latest InternLM series model. Efficient for research and application development.

text

Input

$0.07/M

Output

$0.14/M

Context

128K

Similar Frontier Models

GPT-4o

OpenAI

frontier

OpenAI's most advanced multimodal model. Excels at text, vision, and audio tasks with fast response times.

textimageaudio

Input

$2.50/M

Output

$10.00/M

Context

128K

Kimi K2.5

Moonshot AI

frontier

Moonshot AI's frontier multimodal MoE model with 1T total parameters (32B active). Tops SWE-bench and AIME 2025 benchmarks.

textimagecode

Input

$0.45/M

Output

$2.20/M

Context

256K

Gemini 2.5 Pro

Google

frontier

Google's most capable thinking model with breakthrough performance on reasoning and coding.

textimageaudiovideocode

Input

$1.25/M

Output

$10.00/M

Context

1.0M

InternVL3 78B

Why Choose InternVL3 78B

Strengths & Limitations

Strengths

Limitations

Benchmark Results

Quick Comparison

Quick Comparison

Pricing Calculator

At scale: 1,000 requests/day

Technical Specifications

Community Ratings

Rate This Model

Comments

More from Shanghai AI Lab

InternLM2.5 20B

InternVL2 26B

InternLM3 8B

Similar Frontier Models

GPT-4o

Kimi K2.5

Gemini 2.5 Pro

Compare InternVL3 78B with other models