Llama 4 Maverick
400B parameters, 128 experts - Meta's most capable open model
Llama 4 Maverick is Meta's flagship MoE model. With 400B total parameters routed through 128 experts and only 17B active per token, it delivers frontier-class performance that beats GPT-4o on key benchmarks while remaining fully open-weight.
Model variants
Instruction-tuned and base models
Choose between the instruction-tuned variant optimized for chat and complex tasks, or the base model for fine-tuning and research.
128-Expert MoE Architecture
400B total parameters, 17B active per token
Maverick scales to 128 experts from Scout's 16, packing 400B total parameters while keeping the same 17B active footprint per token. This gives it significantly stronger reasoning, coding, and multimodal capabilities.
The default chat model on this site. Best for tasks requiring maximum quality: complex reasoning, code generation, multimodal analysis, and research synthesis.
Instruction-tuned
Maverick Instruct
Optimized for conversational AI, complex reasoning, and code generation
Fine-tuned with RLHF for following instructions and multi-turn dialogue
Pre-trained
Maverick Base
Foundation MoE model for fine-tuning and specialized applications
Pre-trained on diverse multimodal data with 128-expert routing
Capabilities
Frontier performance from an open-weight model
Llama 4 Maverick combines 128-expert MoE efficiency with advanced reasoning, strong coding, and native multimodal understanding - all at 17B active parameters per token.
128-expert MoE
Routes each token through specialized experts from a pool of 128. 400B total parameters deliver frontier quality at 17B inference cost per token.
Advanced reasoning
Strong performance on MMLU Pro (80.5%) and GPQA Diamond (69.8%). Competitive with proprietary models on complex reasoning tasks.
Code generation
Outperforms GPT-4o on coding benchmarks. Native function calling enables agentic workflows and autonomous code execution.
1M token context
Process long documents, codebases, and extended conversations. Sufficient for most production use cases.
Native multimodal
Early fusion architecture processes text and images together natively. Analyze screenshots, diagrams, and documents alongside text.
Multilingual
Strong performance across multiple languages. Built for global applications with cultural context understanding.
Key highlights
Why Maverick stands out
Maverick is the first open-weight model to consistently beat GPT-4o across multiple benchmark categories.
Benchmark highlights
- MMLU Pro 80.5% - competitive with frontier proprietary models
- GPQA Diamond 69.8% - strong scientific reasoning
- MMMU 73.4% - excellent multimodal understanding
- Outperforms GPT-4o on coding benchmarks
- Arena ELO competitive with top-tier models
Technical specs
- 400B total parameters, 17B active per token
- 128 experts in MoE architecture
- 1M token context window
- Native multimodal (text + image)
- Llama 3.1 compatible license
Performance
Frontier quality from an open-weight MoE model
Llama 4 Maverick achieves 80.5% on MMLU Pro and 73.4% on MMMU, outperforming GPT-4o on multiple benchmarks while activating only 17B parameters per token.
Maverick demonstrates that open-weight models can compete with the best proprietary offerings. Its 128-expert architecture delivers consistent excellence across reasoning, coding, and multimodal tasks.
MMLU Pro 80.5% - frontier-class knowledge and reasoning
GPQA Diamond 69.8% - strong scientific reasoning
MMMU 73.4% - excellent multimodal understanding
Outperforms GPT-4o on coding benchmarks
17B active parameters from 400B total (128 experts)
Benchmark comparison
Maverick vs Scout and previous generation
Maverick's 128-expert architecture delivers significant improvements over Scout and Llama 3.1 across all categories.
| Benchmark | Llama 4 Maverick 128 experts Featured | Llama 4 Scout 16 experts | Llama 3.1 70B Dense | GPT-4o Proprietary |
|---|---|---|---|---|
MMLU Pro Knowledge & reasoning | 80.5% | 74.3% | 66.4% | 78.4% |
GPQA Diamond Scientific knowledge | 69.8% | 57.2% | 46.7% | 53.6% |
LiveCodeBench v5 Coding | 43.4% | 32.8% | 28.5% | 37.0% |
MMMU Multimodal | 73.4% | 69.4% | - | 69.1% |
Context Window Max tokens | 1M | 10M | 128K | 128K |
Total Parameters Model size | 400B | 109B | 70B | - |
Active Parameters Per token | 17B | 17B | 70B | - |
Data from Meta's official model card and independent evaluations.
128-Expert Scale
400B capacity at 17B inference cost
Maverick's 128-expert MoE architecture is a significant scale-up from Scout's 16 experts. Each token is routed to specialized experts, giving the model access to 400B parameters of knowledge while only activating 17B per forward pass.
- 128 experts vs Scout's 16 - 8x more specialization
- 400B total parameters vs Scout's 109B
- Same 17B active parameter cost per token as Scout
Multimodal
Native text and image understanding
Maverick uses early fusion architecture to process text and images together natively. This means visual understanding is built into the model from the ground up, not bolted on as a separate module.
- 73.4% on MMMU multimodal benchmark
- Early fusion architecture for native multimodal processing
- Analyze screenshots, diagrams, charts, and documents
Get started
Try Llama 4 Maverick now
Start chatting instantly or download weights for self-hosted deployment.
Download & deploy
Self-hosted deployment
Download official model weights for deployment on your infrastructure.
Llama 4 Family
Explore the full Llama 4 lineup
Maverick is Meta's flagship open model. Compare it with Scout and see how it stacks up against other frontier models.
Get started
Ready to try Llama 4 Maverick?
Start chatting instantly for free. Maverick is the default model on this site - no setup required.