Llama 4 Models

Two models, one family - from long context to frontier quality

The Llama 4 family features two MoE models: Scout for massive context (10M tokens) and Maverick for maximum quality (128 experts, 400B parameters). Both share 17B active parameters per token and native multimodal support.

All models

Choose the right Llama 4 for your use case

Scout and Maverick are optimized for different scenarios. Scout excels at long-context tasks, Maverick at maximum quality.

Llama 4 Scout

10M context window - the long-context specialist

109B total parameters across 16 experts with 17B active per token. The standout feature is its 10 million token context window - the longest of any openly available model.

Choose Scout when you need to process entire codebases, multi-document research sets, or very long conversation histories in a single call.

Llama 4 Maverick

128 experts, 400B parameters - the quality flagship

400B total parameters across 128 experts with 17B active per token. Outperforms GPT-4o on key benchmarks. The default chat model on this site.

Choose Maverick when you need maximum quality for reasoning, coding, multimodal analysis, and complex task completion.

Long Context

Llama 4 Scout

109B total, 17B active, 16 experts. 10M token context window.

Best for: entire codebases, multi-document analysis, long research papers, extended conversations.

Available now

Flagship

Llama 4 Maverick

400B total, 17B active, 128 experts. Beats GPT-4o on benchmarks.

Best for: complex reasoning, code generation, multimodal tasks, research synthesis.

Available now

Shared capabilities

What both Llama 4 models can do

Scout and Maverick share a common set of capabilities built on Meta's MoE architecture.

Native multimodal

Both models process text and images natively with early fusion architecture. No separate encoders or pipelines needed.

MoE efficiency

Both activate only 17B parameters per token. Scout uses 16 experts (109B total), Maverick uses 128 experts (400B total).

Function calling

Built-in function calling across both models enables agentic workflows. No fine-tuning required for tool use.

Extended context

Scout: 10M tokens. Maverick: 1M tokens. Both far exceed previous generation limits.

Multilingual

Strong multilingual support across both models for global applications.

Open weights

Both models are fully open-weight under the Llama 3.1 compatible license. Deploy anywhere, modify freely.

Quick selection guide

Which model should you choose?

Match your primary use case to the right Llama 4 variant.

Choose Scout when

  • You need to process very long documents (10M tokens)
  • Entire codebase analysis across hundreds of files
  • Multi-document research and synthesis
  • Extended conversation histories
  • Lower memory requirements (109B vs 400B total)

Choose Maverick when

  • Maximum quality is the priority
  • Complex reasoning and scientific tasks
  • Code generation and debugging
  • Multimodal analysis (screenshots, diagrams)
  • Tasks where benchmark performance matters most

Performance

Complete benchmark comparison

Scout optimizes for context length, Maverick for raw quality. Both deliver strong performance relative to their design goals.

The choice between Scout and Maverick depends on your primary need: massive context or maximum quality. Here's how they compare across key benchmarks.

Llama 4 family performance comparison

Maverick: 80.5% MMLU Pro, 73.4% MMMU, beats GPT-4o on coding

Scout: 10M token context, 95%+ retrieval at 8M tokens

Both: 17B active parameters, native multimodal, function calling

Both: open-weight under Llama 3.1 compatible license

Full comparison

Scout vs Maverick side by side

Complete benchmark results across reasoning, coding, multimodal, and deployment metrics.

Benchmark
Maverick
128 experts
Flagship
Scout
16 experts
Long Context
MMLU Pro
Knowledge & reasoning
80.5%74.3%
GPQA Diamond
Scientific knowledge
69.8%57.2%
LiveCodeBench v5
Coding
43.4%32.8%
MMMU
Multimodal
73.4%69.4%
Context Window
Max tokens
1M10M
Total Parameters
Model size
400B109B
Active Parameters
Per token
17B17B
Number of Experts
MoE routing
12816

Data from Meta's official model card and independent evaluations.

Scout

Scout: when context length is everything

Scout's 10M token context window is unmatched. It can process entire codebases, multi-document research sets, and hours of transcripts in a single call. If your task involves very long inputs, Scout is the clear choice.

  • 10M token context - longest of any open model
  • 95%+ retrieval accuracy up to 8M tokens
  • 109B total parameters across 16 experts
Llama 4 Scout - long context specialist

Maverick

Maverick: when quality is the priority

Maverick's 128-expert architecture delivers frontier-class performance. It outperforms GPT-4o on key benchmarks and is the default model on this site for good reason - it handles complex reasoning, coding, and multimodal tasks with ease.

  • 80.5% MMLU Pro - frontier knowledge and reasoning
  • Outperforms GPT-4o on coding benchmarks
  • 400B total parameters across 128 experts
Llama 4 Maverick - frontier quality

Llama 4 Family

Explore each model and compare with competitors

Dive deeper into each Llama 4 variant or see how they compare against other frontier open models.

Llama 4 Scout

10M context window specialist

Explore

Llama 4 Maverick

128-expert flagship model

Explore

Llama 4 vs Kimi K2.6

Meta vs Moonshot comparison

Compare

Llama 4 vs Qwen 3.6

Meta vs Alibaba comparison

Compare

Llama 4 vs DeepSeek V4

MoE architecture showdown

Compare

Llama 4 vs MiniMax M2.7

Scale vs efficiency

Compare

Get started

Find your Llama 4 model

Start chatting with either Llama 4 model for free, or download weights for local deployment.