Llama 4: Meta's Multimodal MoE Models Launch with Scout and Maverick
Meta releases two Llama 4 variants: Scout with 10M context and Maverick with 400B parameters, both using MoE architecture.
GPTUni Team
Meta has released Llama 4, introducing two variants that represent a significant architectural shift from previous Llama models. Both Scout and Maverick use a Mixture-of-Experts design, breaking from the dense transformer approach of Llama 3.
Llama 4 Scout is a 109B-parameter model with 17B active parameters and a 10-million-token context window. The massive context window is the headline feature, enabling use cases like analyzing entire codebases, processing book-length documents, and maintaining very long conversation histories. On benchmarks, Scout scores well on coding and general reasoning tasks while maintaining fast inference speeds thanks to its relatively small active parameter count.
Llama 4 Maverick is the larger variant at 400B total parameters with 17B active. It uses 128 experts and supports a 1M-token context window. Maverick targets higher-quality outputs for tasks requiring deeper reasoning, achieving competitive scores on MMLU, HumanEval, and GPQA.
Both models are released under Meta's open license, allowing commercial use. They are natively multimodal, supporting text and image inputs. Meta has also released the model weights in formats optimized for popular inference frameworks.
The Llama 4 release represents Meta's bet on the MoE architecture for scaling open-source AI. By keeping active parameters low while increasing total capacity, the models can run efficiently on available hardware while matching the performance of much larger dense models.