Llama 4 Scout

10 million tokens of context - the longest window in any open model

Llama 4 Scout is Meta's long-context specialist. With 109B total parameters, 17B active per token across 16 experts, and a 10M token context window, it can process entire codebases, multi-document research libraries, and hours of conversation history in a single call.

Model variants

Instruction-tuned and base models

Choose between the instruction-tuned variant optimized for chat and long-context tasks, or the base model for fine-tuning and custom applications.

Mixture-of-Experts Architecture

109B total parameters, 17B active per token

Llama 4 Scout uses a sparse MoE design with 16 experts, activating 17B parameters per forward pass. The standout feature is its 10 million token context window - the longest of any openly available model.

Ideal for tasks that require processing massive amounts of text: entire codebases, multi-document analysis, long research papers, and extended conversation histories.

Instruction-tuned

Scout Instruct

Optimized for conversational AI and long-context task completion

Fine-tuned for following instructions, multi-turn dialogue, and processing very long inputs

Available now

Pre-trained

Scout Base

Foundation MoE model for fine-tuning and specialized applications

Pre-trained on diverse multimodal data with 16-expert routing

Available now

Capabilities

Built for massive context and multimodal understanding

Llama 4 Scout combines an unprecedented 10M token context window with MoE efficiency, native multimodal support, and strong reasoning capabilities.

10M token context window

The longest context window of any openly available model. Process entire codebases, multi-document research libraries, or hours of conversation in a single call.

MoE efficiency

Activates only 17B parameters per token from a 109B pool across 16 experts. Strong performance at a fraction of the compute cost of dense models.

Code analysis at scale

Load entire repositories into context for cross-file analysis, dependency tracking, and large-scale refactoring tasks.

Agentic workflows

Native function calling and tool use support enables autonomous agents. Build workflows that chain multiple tools without fine-tuning.

Multilingual support

Strong performance across multiple languages with cultural context understanding for global applications.

Native multimodal

Process text and images together with early fusion architecture. Analyze screenshots, diagrams, and documents alongside text.

Key highlights

Why Scout's context window matters

A 10M token context window changes what's possible with a single model call.

What you can fit in 10M tokens

  • An entire medium-sized codebase (50K+ lines across hundreds of files)
  • Multiple research papers or an entire book
  • Hours of meeting transcripts or conversation history
  • Complete documentation sets for complex systems
  • 95%+ retrieval accuracy up to 8M tokens in needle-in-a-haystack tests

Technical specs

  • 109B total parameters, 17B active per token
  • 16 experts in MoE architecture
  • 10M token context window
  • Native multimodal (text + image)
  • Llama 3.1 compatible license

Performance

Long-context specialist with competitive reasoning

Llama 4 Scout delivers strong performance across standard benchmarks while offering an unmatched 10M token context window for long-document tasks.

Scout is optimized for tasks that require processing large amounts of context. While Maverick leads on raw benchmark scores, Scout's 10M context window makes it the clear choice for long-document workflows.

Llama 4 Scout performance comparison chart

10M token context window - longest of any open model

95%+ retrieval accuracy up to 8M tokens

17B active parameters from 109B total (16 experts)

Competitive with models 2-3x its active parameter count

Native multimodal support for text and image inputs

Benchmark comparison

Scout vs Maverick and the Llama 4 family

Scout trades some raw benchmark performance for its massive context window advantage.

Benchmark
Llama 4 Scout
16 experts
Featured
Llama 4 Maverick
128 experts
Llama 3.1 70B
Dense
MMLU Pro
Knowledge & reasoning
74.3%80.5%66.4%
GPQA Diamond
Scientific knowledge
57.2%69.8%46.7%
LiveCodeBench v5
Coding
32.8%43.4%28.5%
MMMU
Multimodal
69.4%73.4%-
Context Window
Max tokens
10M1M128K
Total Parameters
Model size
109B400B70B
Active Parameters
Per token
17B17B70B

Data from Meta's official model card and independent evaluations.

Long Context

10M tokens: process entire codebases in one call

Scout's 10M token context window is the longest of any openly available model. Load entire repositories, multi-document research sets, or hours of transcripts into a single context for comprehensive analysis.

  • 95%+ retrieval accuracy up to 8M tokens in needle-in-a-haystack tests
  • 89% accuracy at the full 10M token limit
  • Process 50K+ lines of code across hundreds of files simultaneously
Llama 4 Scout MoE architecture

MoE Architecture

109B capacity at 17B inference cost

Scout's 16-expert MoE architecture activates only 17B parameters per token while maintaining the representational capacity of a much larger model. This makes it practical to deploy on a single node while still delivering strong performance.

  • 16 experts with 17B active parameters per forward pass
  • Same active parameter count as Maverick at lower total memory
  • Practical for single-node deployment scenarios
Llama 4 Scout 10M context window

Download & deploy

Self-hosted deployment

Download official model weights for deployment on your infrastructure.

Llama 4 Family

Explore the full Llama 4 lineup

Scout is part of Meta's Llama 4 family. Compare it with Maverick and see how it stacks up against other open models.

Llama 4 Maverick

400B MoE flagship with 128 experts

Compare

All Llama 4 Models

Complete family overview

View all

Llama 4 vs Kimi K2.6

Scout/Maverick vs Moonshot's 1T model

Compare

Llama 4 vs Qwen 3.6

Meta vs Alibaba's latest

Compare

Llama 4 vs DeepSeek V4

MoE architecture showdown

Compare

Llama 4 vs MiniMax M2.7

Context vs cost efficiency

Compare

Get started

Ready to try Llama 4 Scout?

Start chatting instantly for free, or download the model for self-hosted deployment. The 10M token context window is waiting.