Browse Models
The simplest way to self-host Mixtral 8x22B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Mixtral 8x22B is a Sparse Mixture of Experts model with 141B total parameters, using only 39B during inference. It features 8 expert blocks per layer with dynamic routing, strong multilingual capabilities, and a 64K context window. Notable for mathematical reasoning, scoring 90.8% on GSM8K.
Mixtral 8x22B is a pretrained generative Sparse Mixture of Experts (MoE) Large Language Model developed by Mistral AI. With 141 billion total parameters, the model utilizes an innovative architecture where only 39 billion parameters are active during inference, contributing to exceptional efficiency according to Mistral AI's announcement. The model uses BF16 tensor type and its safetensors are sharded for optimal performance.
The model builds upon the architecture principles demonstrated in Mixtral 8x7B, which uses 8 feedforward blocks (experts) per layer with a router network selecting experts for each token. While Mixtral 8x7B accesses 47B parameters with 13B active during inference, Mixtral 8x22B scales this approach significantly with its 141B parameter architecture.
Mixtral 8x22B demonstrates exceptional multilingual capabilities, showing fluency in English, French, Italian, German, and Spanish. The model features a 64K token context window, enabling comprehensive information retrieval from extensive documents. It also includes native function calling abilities, making it particularly suitable for complex tasks requiring structured outputs.
In benchmark testing, Mixtral 8x22B shows remarkable performance across various tasks. The instructed version achieves 90.8% on GSM8K maj@8 and 44.6% on Math maj@4, demonstrating strong mathematical reasoning capabilities. The model consistently outperforms other open-source models, particularly in multilingual tasks.
The model is compatible with both the vLLM serving project and the Hugging Face Transformers library, though file formats and parameter names differ from the original torrent release. Implementation requires careful consideration as Mixtral 8x22B is a base model without built-in moderation mechanisms.
Users can access the model through the Hugging Face Transformers library, with the model being available in the US region. The implementation process involves loading the tokenizer and model, preparing text inputs, generating outputs, and decoding results. Released under the Apache 2.0 license, the model has garnered significant attention, with over 3.9 million downloads reported on the Hugging Face platform.