GPTCrunch
Back to Providers
Google logo

Google

Explore all 24 AI models from Google. Compare benchmarks, pricing, and capabilities across the full model lineup.

deepmind.google

24

Total Models

62.8

Avg Benchmark

10

Open Source

5

Modalities

frontiermidbudgettextimageaudiovideocode Open Source

All Models

24 models

Google logo

Gemini 3.1 Pro

Google

frontier

Google's most capable model. 94.3% on GPQA Diamond, 80.6% on SWE-bench, 77.1% on ARC-AGI-2. #1 on 12 of 18 tracked benchmarks.

textimageaudiovideocode

Input

$2.00/M

Output

$12.00/M

Context

1.0M

Google logo

Gemini 3 Flash

Google

mid

Google's frontier-class model at Flash-level latency and cost. 90.4% on GPQA Diamond, 78% on SWE-bench, 1M context window.

textimageaudiovideocode

Input

$0.50/M

Output

$3.00/M

Context

1.0M

Google logo

Gemini 3 Pro

Google

frontier

Most powerful Gemini model with native multimodal understanding. Supports adjustable reasoning depth via thinking_level parameter.

textimageaudiovideocode

Input

$3.50/M

Output

$10.50/M

Context

1.0M

Google logo

Gemini 3 Deep Think

Google

frontier

Specialized reasoning model designed for science, research, and complex engineering challenges.

textimageaudiovideo

Input

$5.00/M

Output

$15.00/M

Context

1.0M

Google logo

Veo 3.1

Google

frontier

An enhanced iteration of Google DeepMind's Veo series that produces 8-second clips that can be seamlessly extended up to 148 seconds through iterative generation. Veo 3.1 improves temporal consistency over long sequences, delivers higher resolution output, and refines audio synchronization for extended storytelling and commercial content production.

videoaudio

Input

$3.00/M

Output

$80.00/M

Google logo

Gemini 2.5 Flash Image

Google

mid

A multimodal extension of Google's Gemini 2.5 Flash model that adds native image generation and editing capabilities alongside text understanding. This model enables conversational image creation, iterative visual refinement, and combined text-image output within a single unified interface, making it particularly effective for design iteration and creative brainstorming workflows.

imagetext

Input

$0.15/M

Output

$30.00/M

Google logo

Imagen 4

Google

frontier

Google DeepMind's fourth-generation image synthesis model capable of producing images up to 2K resolution with exceptional photorealism and compositional accuracy. Imagen 4 includes SynthID watermarking by default for responsible AI deployment, supports advanced inpainting and outpainting, and demonstrates industry-leading performance on text rendering and spatial reasoning tasks.

image

Input

$4.00/M

Output

$20.00/M

Google logo

Veo 3

Google

frontier

Google DeepMind's flagship video generation model that natively produces joint audio-visual output in a single pass. Veo 3 leverages a Latent Diffusion Transformer to generate high-fidelity clips with synchronized dialogue, sound effects, and ambient audio without requiring a separate audio model. It demonstrates strong physical understanding and prompt adherence across diverse cinematic styles.

videoaudio

Input

$5.00/M

Output

$150.00/M

Google logo

Gemini 2.5 Flash

Google

mid

Google's fast and cost-efficient thinking model with strong reasoning capabilities.

textimageaudiovideo

Input

$0.15/M

Output

$0.60/M

Context

1.0M

Google logo

Gemini 2.5 Pro

Google

frontier

Google's most capable thinking model with breakthrough performance on reasoning and coding.

textimageaudiovideocode

Input

$1.25/M

Output

$10.00/M

Context

1.0M

Google logo

Gemma 3 1B

Google

budget

Smallest Gemma 3 model for edge and mobile deployment. Text-only with 128K context.

text

Input

$0.02/M

Output

$0.02/M

Context

128K

Google logo

Gemma 3 27B

Google

mid

Google's open-source multimodal model. Strong performance for its size with vision capabilities.

textimage

Input

$0.10/M

Output

$0.10/M

Context

128K

Google logo

Gemma 3 12B

Google

budget

Efficient open-source model from Google with multimodal capabilities at 12B parameters.

textimage

Input

$0.05/M

Output

$0.05/M

Context

128K

Google logo

Gemma 3 4B

Google

budget

Ultra-efficient open-source model from Google. Runs on mobile and edge devices.

textimage

Input

$0.02/M

Output

$0.02/M

Context

128K

Google logo

PaliGemma2 28B

Google

mid

Open vision-language model for image captioning, visual QA, and OCR tasks. Built on Gemma 2 backbone.

textimage

Input

$0.30/M

Output

$0.60/M

Context

8K

Google logo

PaliGemma2 10B

Google

mid

Mid-size PaliGemma for efficient vision-language tasks. Strong OCR and document understanding.

textimage

Input

$0.15/M

Output

$0.30/M

Context

8K

Google logo

Gemini 2.0 Flash

Google

mid

Google's fastest multimodal model with native tool use and advanced agentic capabilities.

textimageaudiovideo

Input

$0.10/M

Output

$0.40/M

Context

1.0M

Google logo

Gemini 2.0 Flash-Lite

Google

budget

Google's ultra-efficient model offering better performance than Gemini 1.5 Flash at the same cost point.

textimage

Input

$0.07/M

Output

$0.30/M

Context

1.0M

Google logo

Gemini 2 Flash Thinking

Google

mid

Experimental Gemini model with extended chain-of-thought reasoning. Transparent thinking process with strong performance on math and science.

textimage

Input

$0.15/M

Output

$0.60/M

Context

1.0M

Google logo

Gemma 2 2B

Google

budget

Smallest Gemma 2 model for efficient text processing on consumer hardware.

text

Input

$0.02/M

Output

$0.04/M

Context

8K

Google logo

Gemma 2 9B

Google

budget

Efficient open-source model from Google. Great performance-to-size ratio.

text

Input

$0.03/M

Output

$0.03/M

Context

8K

Google logo

Gemma 2 27B

Google

mid

Google's previous-gen open-source model with strong general capabilities.

text

Input

$0.07/M

Output

$0.07/M

Context

8K

Google logo

CodeGemma 7B

Google

budget

Google's open-source code-focused model based on the Gemma architecture.

code

Input

$0.03/M

Output

$0.03/M

Context

8K

Google logo

Gemini 1.5 Pro

Google

mid

Google's previous-gen flagship model with the longest context window in production.

textimageaudiovideo

Input

$1.25/M

Output

$5.00/M

Context

2.1M

Compare Google models side by side

See how Google models stack up against each other and the competition