Llama 4 Scout
10 million tokens of context - the longest window in any open model
Llama 4 Scout is Meta's long-context specialist. With 109B total parameters, 17B active per token across 16 experts, and a 10M token context window, it can process entire codebases, multi-document research libraries, and hours of conversation history in a single call.
Model variants
Instruction-tuned and base models
Choose between the instruction-tuned variant optimized for chat and long-context tasks, or the base model for fine-tuning and custom applications.
Mixture-of-Experts Architecture
109B total parameters, 17B active per token
Llama 4 Scout uses a sparse MoE design with 16 experts, activating 17B parameters per forward pass. The standout feature is its 10 million token context window - the longest of any openly available model.
Ideal for tasks that require processing massive amounts of text: entire codebases, multi-document analysis, long research papers, and extended conversation histories.
Instruction-tuned
Scout Instruct
Optimized for conversational AI and long-context task completion
Fine-tuned for following instructions, multi-turn dialogue, and processing very long inputs
Pre-trained
Scout Base
Foundation MoE model for fine-tuning and specialized applications
Pre-trained on diverse multimodal data with 16-expert routing
Capabilities
Built for massive context and multimodal understanding
Llama 4 Scout combines an unprecedented 10M token context window with MoE efficiency, native multimodal support, and strong reasoning capabilities.
10M token context window
The longest context window of any openly available model. Process entire codebases, multi-document research libraries, or hours of conversation in a single call.
MoE efficiency
Activates only 17B parameters per token from a 109B pool across 16 experts. Strong performance at a fraction of the compute cost of dense models.
Code analysis at scale
Load entire repositories into context for cross-file analysis, dependency tracking, and large-scale refactoring tasks.
Agentic workflows
Native function calling and tool use support enables autonomous agents. Build workflows that chain multiple tools without fine-tuning.
Multilingual support
Strong performance across multiple languages with cultural context understanding for global applications.
Native multimodal
Process text and images together with early fusion architecture. Analyze screenshots, diagrams, and documents alongside text.
Key highlights
Why Scout's context window matters
A 10M token context window changes what's possible with a single model call.
What you can fit in 10M tokens
- An entire medium-sized codebase (50K+ lines across hundreds of files)
- Multiple research papers or an entire book
- Hours of meeting transcripts or conversation history
- Complete documentation sets for complex systems
- 95%+ retrieval accuracy up to 8M tokens in needle-in-a-haystack tests
Technical specs
- 109B total parameters, 17B active per token
- 16 experts in MoE architecture
- 10M token context window
- Native multimodal (text + image)
- Llama 3.1 compatible license
Performance
Long-context specialist with competitive reasoning
Llama 4 Scout delivers strong performance across standard benchmarks while offering an unmatched 10M token context window for long-document tasks.
Scout is optimized for tasks that require processing large amounts of context. While Maverick leads on raw benchmark scores, Scout's 10M context window makes it the clear choice for long-document workflows.
10M token context window - longest of any open model
95%+ retrieval accuracy up to 8M tokens
17B active parameters from 109B total (16 experts)
Competitive with models 2-3x its active parameter count
Native multimodal support for text and image inputs
Benchmark comparison
Scout vs Maverick and the Llama 4 family
Scout trades some raw benchmark performance for its massive context window advantage.
| Benchmark | Llama 4 Scout 16 experts Featured | Llama 4 Maverick 128 experts | Llama 3.1 70B Dense |
|---|---|---|---|
MMLU Pro Knowledge & reasoning | 74.3% | 80.5% | 66.4% |
GPQA Diamond Scientific knowledge | 57.2% | 69.8% | 46.7% |
LiveCodeBench v5 Coding | 32.8% | 43.4% | 28.5% |
MMMU Multimodal | 69.4% | 73.4% | - |
Context Window Max tokens | 10M | 1M | 128K |
Total Parameters Model size | 109B | 400B | 70B |
Active Parameters Per token | 17B | 17B | 70B |
Data from Meta's official model card and independent evaluations.
Long Context
10M tokens: process entire codebases in one call
Scout's 10M token context window is the longest of any openly available model. Load entire repositories, multi-document research sets, or hours of transcripts into a single context for comprehensive analysis.
- 95%+ retrieval accuracy up to 8M tokens in needle-in-a-haystack tests
- 89% accuracy at the full 10M token limit
- Process 50K+ lines of code across hundreds of files simultaneously
MoE Architecture
109B capacity at 17B inference cost
Scout's 16-expert MoE architecture activates only 17B parameters per token while maintaining the representational capacity of a much larger model. This makes it practical to deploy on a single node while still delivering strong performance.
- 16 experts with 17B active parameters per forward pass
- Same active parameter count as Maverick at lower total memory
- Practical for single-node deployment scenarios
Get started
Try Llama 4 Scout now
Start chatting instantly or download weights for self-hosted deployment.
Download & deploy
Self-hosted deployment
Download official model weights for deployment on your infrastructure.
Llama 4 Family
Explore the full Llama 4 lineup
Scout is part of Meta's Llama 4 family. Compare it with Maverick and see how it stacks up against other open models.
Get started
Ready to try Llama 4 Scout?
Start chatting instantly for free, or download the model for self-hosted deployment. The 10M token context window is waiting.