Llama 4 Maverick
400B parameters, 128 experts - Meta's most capable open model
Llama 4 Maverick is the first open-weight model to consistently outperform GPT-4o across multiple benchmark categories. With 400B total parameters routed through 128 specialized experts and only 17B active per token, it delivers frontier-class reasoning, coding, and multimodal understanding without the cost of a proprietary API. Whether you need advanced code generation, scientific analysis, or image understanding, Llama 4 Maverick brings the quality of closed-source leaders to an open-weight package you can deploy anywhere.
Model variants
Instruction-tuned and base models
Choose between the instruction-tuned variant optimized for chat and complex tasks, or the base model for fine-tuning and research.
128-Expert MoE Architecture
400B total parameters, 17B active per token
Maverick scales to 128 experts from Scout's 16, packing 400B total parameters while keeping the same 17B active footprint per token. This gives it significantly stronger reasoning, coding, and multimodal capabilities.
The default chat model on this site. Best for tasks requiring maximum quality: complex reasoning, code generation, multimodal analysis, and research synthesis.
Instruction-tuned
Maverick Instruct
Optimized for conversational AI, complex reasoning, and code generation
Fine-tuned with RLHF for following instructions and multi-turn dialogue
Pre-trained
Maverick Base
Foundation MoE model for fine-tuning and specialized applications
Pre-trained on diverse multimodal data with 128-expert routing
Capabilities
Frontier performance from Llama 4 Maverick
Llama 4 Maverick combines 128-expert MoE efficiency with advanced reasoning, strong coding, and native multimodal understanding. Every capability is tuned for maximum quality at 17B active parameters per token, making it a practical alternative to proprietary frontier models.
128-expert MoE
Routes each token through specialized experts from a pool of 128. The 400B total parameters deliver frontier quality at only 17B inference cost per token. This architecture means each expert can develop deep specialization in specific domains, from mathematics to creative writing, resulting in consistently high quality across diverse tasks.
Advanced reasoning
Strong performance on MMLU Pro (80.5%) and GPQA Diamond (69.8%) demonstrates deep knowledge and scientific reasoning. Llama 4 Maverick handles multi-step logic, mathematical proofs, and complex analytical tasks with accuracy that matches or exceeds proprietary alternatives. The 128-expert architecture allows different experts to contribute specialized knowledge at each reasoning step.
Code generation
Outperforms GPT-4o on coding benchmarks including LiveCodeBench v5. Llama 4 Maverick generates production-ready code across dozens of programming languages, debugs complex issues, and explains algorithmic approaches clearly. Native function calling enables agentic workflows where the model can autonomously execute code, call APIs, and chain tool operations.
1M token context
Process long documents, codebases, and extended conversations within a 1 million token context window. While Scout offers 10M tokens for extreme long-context tasks, the 1M window in Llama 4 Maverick is sufficient for most production use cases including full project analysis, lengthy research papers, and multi-turn conversations that span hundreds of exchanges.
Native multimodal
Early fusion architecture processes text and images together natively from the ground up. Analyze screenshots, diagrams, charts, technical drawings, and documents alongside text without separate vision pipelines. Llama 4 Maverick scores 73.4% on MMMU, demonstrating strong visual reasoning that rivals dedicated vision models.
Multilingual
Strong performance across multiple languages makes Llama 4 Maverick suitable for global applications. The model handles translation, cross-lingual reasoning, and culturally nuanced content generation with consistent quality. Whether your users communicate in English, Chinese, Spanish, French, or other supported languages, the output quality remains high.
Key highlights
Why Llama 4 Maverick stands out
Llama 4 Maverick is the first open-weight model to consistently beat GPT-4o across multiple benchmark categories.
Benchmark highlights
- MMLU Pro 80.5% - competitive with frontier proprietary models
- GPQA Diamond 69.8% - strong scientific reasoning
- MMMU 73.4% - excellent multimodal understanding
- Outperforms GPT-4o on coding benchmarks
- Arena ELO competitive with top-tier models
Technical specs
- 400B total parameters, 17B active per token
- 128 experts in MoE architecture
- 1M token context window
- Native multimodal (text + image)
- Llama 3.1 compatible license
Performance
Frontier quality from Llama 4 Maverick
Llama 4 Maverick achieves 80.5% on MMLU Pro and 73.4% on MMMU, outperforming GPT-4o on multiple benchmarks while activating only 17B parameters per token.
The benchmark results tell a compelling story, but real-world performance is where Llama 4 Maverick truly proves itself. Developers report that code generation quality rivals the best proprietary models, with fewer hallucinations and more accurate function implementations. Researchers find that scientific reasoning tasks produce well-structured, citation-aware responses. The 128-expert architecture means the model can draw on deeply specialized knowledge for each subtask, resulting in outputs that feel like they come from a domain expert rather than a generalist.
MMLU Pro 80.5% - frontier-class knowledge and reasoning
GPQA Diamond 69.8% - strong scientific reasoning
MMMU 73.4% - excellent multimodal understanding
Outperforms GPT-4o on coding benchmarks
17B active parameters from 400B total (128 experts)
Benchmark comparison
Maverick vs Scout and previous generation
Maverick's 128-expert architecture delivers significant improvements over Scout and Llama 3.1 across all categories.
| Benchmark | Llama 4 Maverick 128 experts Featured | Llama 4 Scout 16 experts | Llama 3.1 70B Dense | GPT-4o Proprietary |
|---|---|---|---|---|
MMLU Pro Knowledge & reasoning | 80.5% | 74.3% | 66.4% | 78.4% |
GPQA Diamond Scientific knowledge | 69.8% | 57.2% | 46.7% | 53.6% |
LiveCodeBench v5 Coding | 43.4% | 32.8% | 28.5% | 37.0% |
MMMU Multimodal | 73.4% | 69.4% | - | 69.1% |
Context Window Max tokens | 1M | 10M | 128K | 128K |
Total Parameters Model size | 400B | 109B | 70B | - |
Active Parameters Per token | 17B | 17B | 70B | - |
Data from Meta's official model card and independent evaluations.
128-Expert Scale
How Llama 4 Maverick delivers 400B capacity at 17B cost
The 128-expert MoE architecture in Llama 4 Maverick is a significant scale-up from Scout's 16 experts. Each token is routed to specialized experts, giving the model access to 400B parameters of knowledge while only activating 17B per forward pass. This design enables frontier-class quality without frontier-class compute requirements.
- 128 experts vs Scout's 16 for 8x more specialization per token
- 400B total parameters vs Scout's 109B for deeper knowledge capacity
- Same 17B active parameter cost per token as Scout for efficient inference
- Each expert develops deep domain specialization during training
- Sparse routing ensures optimal expert selection for every input
Multimodal
Native image understanding in Llama 4 Maverick
Llama 4 Maverick uses early fusion architecture to process text and images together natively. Visual understanding is built into the model from the ground up, not bolted on as a separate module. This results in seamless reasoning across both modalities with strong performance on visual benchmarks.
- 73.4% on MMMU multimodal benchmark, surpassing GPT-4o's 69.1%
- Early fusion architecture for native multimodal processing without separate pipelines
- Analyze screenshots, diagrams, charts, and technical documents with precision
- Combine visual analysis with code generation for UI development workflows
- Process mixed content documents containing both text and embedded images
Coding
Coding and function calling with Llama 4 Maverick
Llama 4 Maverick outperforms GPT-4o on coding benchmarks and includes native function calling for building autonomous agent workflows. Whether you need to generate production code, debug complex issues, or build tool-using agents, the 128-expert architecture provides specialized knowledge across programming languages and frameworks.
- 43.4% on LiveCodeBench v5, exceeding GPT-4o's 37.0% on the same benchmark
- Native function calling enables autonomous agent workflows without fine-tuning
- Generate production-ready code across Python, JavaScript, TypeScript, Rust, and more
- Debug complex multi-file issues with full context awareness across your codebase
- Chain multiple tool calls for end-to-end task automation in agentic applications
Get started
Try Llama 4 Maverick now
Start chatting instantly or download weights for self-hosted deployment.
Download & deploy
Self-hosted deployment
Download official model weights for deployment on your infrastructure.
FAQ
Frequently asked questions about Llama 4 Maverick
Answers to the most common questions about performance, deployment, and practical usage of Llama 4 Maverick.
Yes. Llama 4 Maverick outperforms GPT-4o on several key benchmarks. It scores 80.5% on MMLU Pro compared to GPT-4o's 78.4%, 69.8% on GPQA Diamond versus 53.6%, and 43.4% on LiveCodeBench v5 versus 37.0%. On multimodal tasks, it achieves 73.4% on MMMU compared to GPT-4o's 69.1%. These results come from Meta's official evaluations and independent testing.
Running Llama 4 Maverick at full precision requires approximately 800 GB of VRAM, which typically means a cluster of 8 or more A100 80 GB GPUs. With INT8 quantization, you can reduce this to around 400 GB (roughly 5 A100 GPUs). INT4 quantization brings it down further to approximately 200 GB. Cloud providers also offer hosted API access if local deployment is not practical for your setup.
The 128-expert mixture of experts architecture allows Llama 4 Maverick to store 400B parameters of knowledge while only activating 17B per token during inference. Each expert develops deep specialization during training, so the routing mechanism can select the most relevant experts for each input. This gives the model the knowledge depth of a 400B dense model at a fraction of the compute cost.
Yes. Llama 4 Maverick is released under the Llama 3.1 compatible license, which permits commercial use. You can build products, deploy services, and fine-tune the model for your specific business needs. The license includes usage thresholds for very large-scale deployments, so review the full terms if your application serves hundreds of millions of monthly active users.
Llama 4 Maverick uses early fusion architecture, meaning image understanding is built into the model from the ground up rather than added as a separate vision encoder. It processes text and images in a unified stream, enabling natural reasoning across both modalities. It scores 73.4% on MMMU, demonstrating strong performance on tasks that require understanding charts, diagrams, screenshots, and documents.
Several cloud providers offer hosted API access to Llama 4 Maverick, including services on AWS, Google Cloud, Azure, and specialized inference platforms like Together AI, Fireworks, and Groq. You can also self-host using frameworks like vLLM or TGI. For quick experimentation, the chat interface on this site runs Llama 4 Maverick as the default model with no setup required.
Llama 4 Family
Explore the full Llama 4 lineup
Maverick is Meta's flagship open model. Compare it with Scout and see how it stacks up against other frontier models.
Get started
Ready to try Llama 4 Maverick?
Start chatting instantly for free. Maverick is the default model on this site - no setup required.