Llama 4 Models

Two models, one family - from long context to frontier quality

The Llama 4 models represent Meta's most ambitious open-weight release to date. This family features two mixture of experts architectures designed for different priorities: Scout handles massive context with a 10 million token window across 16 experts, while Maverick delivers frontier-class quality through 128 experts and 400B total parameters. Both share 17B active parameters per token and native multimodal support, giving you the flexibility to choose the right balance of context length and output quality for your specific workflow.

All models

Choose the right option from the Llama 4 models

Scout and Maverick are optimized for different scenarios. Understanding their strengths helps you pick the variant that matches your workload, whether that means processing entire codebases or generating the highest quality reasoning and code.

Llama 4 Scout

10M context window - the long-context specialist

109B total parameters across 16 experts with 17B active per token. The standout feature is its 10 million token context window, the longest of any openly available model. Scout excels when your task requires ingesting large volumes of information at once, from entire repositories to multi-document research collections. Needle-in-a-haystack tests confirm 95% retrieval accuracy up to 8 million tokens.

Choose Scout when you need to process entire codebases, multi-document research sets, or very long conversation histories in a single call. It is the best option when context length matters more than marginal quality differences.

Llama 4 Maverick

128 experts, 400B parameters - the quality flagship

400B total parameters across 128 experts with 17B active per token. Maverick outperforms GPT-4o on key benchmarks including MMLU Pro, GPQA Diamond, and LiveCodeBench. The 128-expert architecture provides deep specialization across domains, making it the strongest open-weight model available for reasoning, coding, and multimodal tasks. It offers a 1M token context window for most production needs.

Choose Maverick when you need maximum quality for reasoning, coding, multimodal analysis, and complex task completion. It is the default chat model on this site for good reason.

Long Context

Llama 4 Scout

109B total, 17B active, 16 experts. 10M token context window.

Best for: entire codebases, multi-document analysis, long research papers, extended conversations.

Available now

Flagship

Llama 4 Maverick

400B total, 17B active, 128 experts. Beats GPT-4o on benchmarks.

Best for: complex reasoning, code generation, multimodal tasks, research synthesis.

Available now

Shared capabilities

What all Llama 4 models can do

Scout and Maverick share a common set of capabilities built on Meta's mixture of experts architecture. These shared foundations mean you can switch between the two variants without changing your integration code.

Native multimodal

Both Llama 4 models process text and images natively with early fusion architecture. Visual understanding is built in from the ground up, not added as a separate encoder. This means you can send mixed content, including screenshots, diagrams, and documents alongside text, and get coherent reasoning across both modalities.

MoE efficiency

Both Llama 4 models activate only 17B parameters per token despite their large total parameter counts. Scout uses 16 experts with 109B total, Maverick uses 128 experts with 400B total. This sparse routing strategy delivers strong performance at a fraction of the compute cost of equivalent dense architectures.

Function calling

Built-in function calling across both Llama 4 models enables agentic workflows without additional fine-tuning. Define your tools, and the model will decide when and how to call them. This makes it straightforward to build autonomous agents that query databases, call APIs, execute code, and chain operations together.

Extended context

Scout offers a 10M token context window for extreme long-document tasks, while Maverick provides 1M tokens for most production scenarios. Both far exceed the 128K limit of previous generation models, giving you room to include more context, more examples, and more history in every request.

Multilingual

Strong multilingual support across both Llama 4 models enables global applications. Whether your users communicate in English, Chinese, Spanish, French, or other supported languages, both variants maintain consistent quality with culturally aware responses.

Open weights

Both Llama 4 models are fully open-weight under the Llama 3.1 compatible license. Deploy anywhere, modify freely, and fine-tune for your specific needs. This openness means no vendor lock-in, full transparency into model behavior, and the ability to run entirely on your own infrastructure.

Quick selection guide

Which of the Llama 4 models should you choose?

Match your primary use case to the right variant.

Choose Scout when

  • You need to process very long documents (10M tokens)
  • Entire codebase analysis across hundreds of files
  • Multi-document research and synthesis
  • Extended conversation histories
  • Lower memory requirements (109B vs 400B total)

Choose Maverick when

  • Maximum quality is the priority
  • Complex reasoning and scientific tasks
  • Code generation and debugging
  • Multimodal analysis (screenshots, diagrams)
  • Tasks where benchmark performance matters most

Performance

Complete benchmark comparison across Llama 4 models

Scout optimizes for context length, Maverick for raw quality. Both deliver strong performance relative to their design goals.

Choosing between the Llama 4 models comes down to your primary need. If your workflow involves processing large volumes of text, code, or documents in a single call, Scout's 10M token context window is unmatched. If you need the highest possible quality for reasoning, coding, or multimodal tasks, Maverick's 128-expert architecture delivers frontier-class results that compete with the best proprietary offerings. Many teams use both: Maverick for quality-critical tasks and Scout for large-scale analysis.

Llama 4 family performance comparison

Maverick: 80.5% MMLU Pro, 73.4% MMMU, beats GPT-4o on coding

Scout: 10M token context, 95%+ retrieval at 8M tokens

Both: 17B active parameters, native multimodal, function calling

Both: open-weight under Llama 3.1 compatible license

Full comparison

Scout vs Maverick side by side

Complete benchmark results across reasoning, coding, multimodal, and deployment metrics.

Benchmark
Maverick
128 experts
Flagship
Scout
16 experts
Long Context
MMLU Pro
Knowledge & reasoning
80.5%74.3%
GPQA Diamond
Scientific knowledge
69.8%57.2%
LiveCodeBench v5
Coding
43.4%32.8%
MMMU
Multimodal
73.4%69.4%
Context Window
Max tokens
1M10M
Total Parameters
Model size
400B109B
Active Parameters
Per token
17B17B
Number of Experts
MoE routing
12816

Data from Meta's official model card and independent evaluations.

Scout

Llama 4 Scout: when context length is everything

Scout's 10M token context window is unmatched among the Llama 4 models and across the entire open-weight landscape. It can process entire codebases, multi-document research sets, and hours of transcripts in a single call. If your task involves very long inputs, Scout is the clear choice.

  • 10M token context, the longest of any open model available today
  • 95%+ retrieval accuracy up to 8M tokens in needle-in-a-haystack tests
  • 109B total parameters across 16 experts with 17B active per token
  • Process entire GitHub repositories for comprehensive code review
  • Ideal for legal document analysis, research synthesis, and audit workflows
Llama 4 Scout - long context specialist

Maverick

Llama 4 Maverick: when quality is the priority

Maverick's 128-expert architecture delivers frontier-class performance that outperforms GPT-4o on key benchmarks. It is the default model on this site for good reason: it handles complex reasoning, coding, and multimodal tasks with the quality you would expect from the best proprietary alternatives.

  • 80.5% MMLU Pro for frontier-class knowledge and reasoning
  • Outperforms GPT-4o on coding benchmarks with 43.4% on LiveCodeBench v5
  • 400B total parameters across 128 experts for deep domain specialization
  • 73.4% on MMMU for strong multimodal understanding of images and documents
  • Native function calling for building autonomous agent workflows
Llama 4 Maverick - frontier quality

Selection Guide

Choosing the right option from the Llama 4 models

Picking between the Llama 4 models depends on what matters most for your specific workflow. Both share the same 17B active parameter footprint and native multimodal support, so the decision comes down to context length versus output quality. Many teams find value in using both variants for different parts of their pipeline.

  • Pick Scout for tasks that require processing more than 1 million tokens at once
  • Pick Maverick for tasks where output quality and reasoning depth matter most
  • Both share 17B active parameters, so inference cost per token is comparable
  • Use Scout for ingestion and analysis, then Maverick for synthesis and generation
  • Both run under the same open-weight license, so you can deploy either or both freely

Download

Get model weights

Download official weights for either Llama 4 variant.

FAQ

Frequently asked questions about Llama 4 models

Answers to the most common questions about choosing, running, and deploying the Llama 4 models for your projects.

How many Llama 4 models are available right now?

There are currently two Llama 4 models: Scout and Maverick. Each comes in two variants, an instruction-tuned version optimized for chat and task completion, and a base pre-trained version for fine-tuning and research. That gives you four total checkpoints to choose from depending on whether you need a ready-to-use conversational model or a foundation for custom training.

Which Llama 4 model is best for coding tasks?

Maverick is the stronger choice for coding tasks. It scores 43.4% on LiveCodeBench v5, outperforming both Scout (32.8%) and GPT-4o (37.0%). The 128-expert architecture provides deep specialization across programming languages and frameworks. However, if you need to analyze an entire large codebase at once, Scout's 10M token context window lets you load everything into a single call for cross-file analysis.

Can I run any Llama 4 model on a consumer GPU?

Running the full versions requires multi-GPU setups. Scout needs approximately 220 GB of VRAM at full precision, and Maverick needs around 800 GB. However, quantized versions significantly reduce these requirements. Scout with INT4 quantization can fit on roughly 55 GB, which is achievable with high-end consumer GPUs. Maverick with INT4 still needs around 200 GB, making it more suited to cloud or enterprise hardware.

What is the difference between Scout and Maverick in the Llama 4 family?

Scout is optimized for long-context tasks with a 10M token window and 16 experts (109B total parameters). Maverick prioritizes output quality with 128 experts and 400B total parameters but has a 1M token context window. Both activate 17B parameters per token. Think of Scout as the wide-angle lens and Maverick as the high-resolution lens in the same camera system.

Are all Llama 4 models free and open weight?

Yes. All Llama 4 models are released under the Llama 3.1 compatible license, which permits commercial use, fine-tuning, and redistribution. You can deploy them on your own infrastructure, build products on top of them, and modify the weights for your specific needs. The license includes usage thresholds for very large-scale deployments serving hundreds of millions of users.

Which Llama 4 model should I choose for document analysis?

It depends on the volume and complexity of your documents. For analyzing large collections of documents, contracts, or research papers in a single pass, Scout's 10M token context window is ideal. For shorter documents where you need the highest quality extraction, summarization, or reasoning, Maverick's 128-expert architecture produces more nuanced and accurate results. Both support native image understanding for documents with charts, tables, and diagrams.

Llama 4 Family

Explore each model and compare with competitors

Dive deeper into each variant or see how the Llama 4 models compare against other frontier open models.

Llama 4 Scout

10M context window specialist

Explore

Llama 4 Maverick

128-expert flagship model

Explore

Llama 4 vs Kimi K2.6

Meta vs Moonshot comparison

Compare

Llama 4 vs Qwen 3.6

Meta vs Alibaba comparison

Compare

Llama 4 vs DeepSeek V4

MoE architecture showdown

Compare

Llama 4 vs MiniMax M2.7

Scale vs efficiency

Compare

Get started

Find your ideal option among the Llama 4 models

Start chatting with either variant for free, or download weights for local deployment. Both are open-weight and ready to use.