Skip to main content
Mistral AIReleased December 11, 2023

Mixtral 8x7B

Open Source56B (13B active) parameters

Mixtral 8x7B is Mistral AI's entry in a crowded field. Context window: 0.032K tokens.

Context

32K

Input

Free (open)

Key Specifications

🏆

Arena Rank

Not disclosed

📐

Context Window

32K

📥

Input Price

per 1M tokens

Free (open)

📤

Output Price

per 1M tokens

Free (open)

🧠

Parameters

56B (13B active)

🔓

Open Source

Yes

Best For

Efficient inferencemultilingualcoding

About Mixtral 8x7B

Mixtral 8x7B, developed by Mistral AI, is an open-source Mixture-of-Experts model with 56 billion total parameters (13 billion active per token) and a 32K token context window. The model pioneered the practical application of MoE architecture in open-source AI, demonstrating that sparse expert routing could deliver performance comparable to much larger dense models at a fraction of the inference cost. Mixtral 8x7B handles coding, reasoning, and multilingual tasks efficiently, activating only the most relevant experts for each input. Free and fully open-source, it runs on consumer-grade multi-GPU setups and has become a benchmark for efficient model design. Its success influenced subsequent MoE models from DeepSeek, Alibaba, and others. The model remains widely deployed in production for cost-sensitive applications requiring better-than-7B performance.

Pricing per 1M tokens

Input Tokens

Free (open)

Output Tokens

Free (open)

Frequently Asked Questions

What is Mixtral 8x7B?
Mixtral 8x7B, developed by Mistral AI, is an open-source Mixture-of-Experts model with 56 billion total parameters (13 billion active per token) and a 32K token context window. The model pioneered the practical application of MoE architecture in open-source AI, demonstrating that sparse expert routing could deliver performance comparable to much larger dense models at a fraction of the inference cost. Mixtral 8x7B handles coding, reasoning, and multilingual tasks efficiently, activating only the most relevant experts for each input. Free and fully open-source, it runs on consumer-grade multi-GPU setups and has become a benchmark for efficient model design. Its success influenced subsequent MoE models from DeepSeek, Alibaba, and others. The model remains widely deployed in production for cost-sensitive applications requiring better-than-7B performance.
How much does Mixtral 8x7B cost?
Mixtral 8x7B costs Free (open) per 1M input tokens and Free (open) per 1M output tokens. You pay only for what you use, which keeps costs predictable.
What is Mixtral 8x7B's context window?
Mixtral 8x7B has a context window of 32K tokens. This determines how much text the model can process in a single request — bigger windows mean longer documents and richer conversation history.
Is Mixtral 8x7B open source?
Yes, Mixtral 8x7B is open source. The model weights are publicly available, so developers can download, fine-tune, and self-host it. Open-source models give teams more control over data privacy and deployment.
What is Mixtral 8x7B best for?
Mixtral 8x7B is best suited for: Efficient inference, multilingual, coding. These use cases play to the model's strengths in capability, speed, and cost within Mistral AI's lineup.