Llama 4 Models
Two models, one family - from long context to frontier quality
The Llama 4 family features two MoE models: Scout for massive context (10M tokens) and Maverick for maximum quality (128 experts, 400B parameters). Both share 17B active parameters per token and native multimodal support.
All models
Choose the right Llama 4 for your use case
Scout and Maverick are optimized for different scenarios. Scout excels at long-context tasks, Maverick at maximum quality.
Llama 4 Scout
10M context window - the long-context specialist
109B total parameters across 16 experts with 17B active per token. The standout feature is its 10 million token context window - the longest of any openly available model.
Choose Scout when you need to process entire codebases, multi-document research sets, or very long conversation histories in a single call.
Llama 4 Maverick
128 experts, 400B parameters - the quality flagship
400B total parameters across 128 experts with 17B active per token. Outperforms GPT-4o on key benchmarks. The default chat model on this site.
Choose Maverick when you need maximum quality for reasoning, coding, multimodal analysis, and complex task completion.
Long Context
Llama 4 Scout
109B total, 17B active, 16 experts. 10M token context window.
Best for: entire codebases, multi-document analysis, long research papers, extended conversations.
Flagship
Llama 4 Maverick
400B total, 17B active, 128 experts. Beats GPT-4o on benchmarks.
Best for: complex reasoning, code generation, multimodal tasks, research synthesis.
Shared capabilities
What both Llama 4 models can do
Scout and Maverick share a common set of capabilities built on Meta's MoE architecture.
Native multimodal
Both models process text and images natively with early fusion architecture. No separate encoders or pipelines needed.
MoE efficiency
Both activate only 17B parameters per token. Scout uses 16 experts (109B total), Maverick uses 128 experts (400B total).
Function calling
Built-in function calling across both models enables agentic workflows. No fine-tuning required for tool use.
Extended context
Scout: 10M tokens. Maverick: 1M tokens. Both far exceed previous generation limits.
Multilingual
Strong multilingual support across both models for global applications.
Open weights
Both models are fully open-weight under the Llama 3.1 compatible license. Deploy anywhere, modify freely.
Quick selection guide
Which model should you choose?
Match your primary use case to the right Llama 4 variant.
Choose Scout when
- You need to process very long documents (10M tokens)
- Entire codebase analysis across hundreds of files
- Multi-document research and synthesis
- Extended conversation histories
- Lower memory requirements (109B vs 400B total)
Choose Maverick when
- Maximum quality is the priority
- Complex reasoning and scientific tasks
- Code generation and debugging
- Multimodal analysis (screenshots, diagrams)
- Tasks where benchmark performance matters most
Performance
Complete benchmark comparison
Scout optimizes for context length, Maverick for raw quality. Both deliver strong performance relative to their design goals.
The choice between Scout and Maverick depends on your primary need: massive context or maximum quality. Here's how they compare across key benchmarks.
Maverick: 80.5% MMLU Pro, 73.4% MMMU, beats GPT-4o on coding
Scout: 10M token context, 95%+ retrieval at 8M tokens
Both: 17B active parameters, native multimodal, function calling
Both: open-weight under Llama 3.1 compatible license
Full comparison
Scout vs Maverick side by side
Complete benchmark results across reasoning, coding, multimodal, and deployment metrics.
| Benchmark | Maverick 128 experts Flagship | Scout 16 experts Long Context |
|---|---|---|
MMLU Pro Knowledge & reasoning | 80.5% | 74.3% |
GPQA Diamond Scientific knowledge | 69.8% | 57.2% |
LiveCodeBench v5 Coding | 43.4% | 32.8% |
MMMU Multimodal | 73.4% | 69.4% |
Context Window Max tokens | 1M | 10M |
Total Parameters Model size | 400B | 109B |
Active Parameters Per token | 17B | 17B |
Number of Experts MoE routing | 128 | 16 |
Data from Meta's official model card and independent evaluations.
Scout
Scout: when context length is everything
Scout's 10M token context window is unmatched. It can process entire codebases, multi-document research sets, and hours of transcripts in a single call. If your task involves very long inputs, Scout is the clear choice.
- 10M token context - longest of any open model
- 95%+ retrieval accuracy up to 8M tokens
- 109B total parameters across 16 experts
Maverick
Maverick: when quality is the priority
Maverick's 128-expert architecture delivers frontier-class performance. It outperforms GPT-4o on key benchmarks and is the default model on this site for good reason - it handles complex reasoning, coding, and multimodal tasks with ease.
- 80.5% MMLU Pro - frontier knowledge and reasoning
- Outperforms GPT-4o on coding benchmarks
- 400B total parameters across 128 experts
Try now
Start chatting with Llama 4
Try both models instantly through our chat interface.
Download
Get model weights
Download official weights for either Llama 4 variant.
Llama 4 Family
Explore each model and compare with competitors
Dive deeper into each Llama 4 variant or see how they compare against other frontier open models.
Get started
Find your Llama 4 model
Start chatting with either Llama 4 model for free, or download weights for local deployment.