Browse Models
Note: Mistral Small (2409) weights are released under a Mistral AI Research License, and cannot be utilized for commercial purposes. Please read the license to verify if your use case is permitted.
The simplest way to self-host Mistral Small (2409). Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Mistral Small v24.09 is a 22B parameter language model with 32k token context handling and multilingual support across 10 languages. It bridges Mistral's 12B and Large models, featuring improved reasoning and code capabilities. Notable for tensor parallelism and Safetensors implementation.
Mistral Small v24.09 represents a significant advancement in enterprise-grade language models, featuring 22 billion parameters and delivering improved performance across multiple domains. This instruction-tuned large language model (LLM) sits strategically between Mistral NeMo 12B and Mistral Large 2, offering an optimal balance of capability and efficiency.
The model boasts a vocabulary size of 32,768 and can process sequences up to 32,000 tokens in length. It supports function calling and is available in the Safetensors format. The architecture requires substantial computational resources - running inference on a single GPU demands at least 44 GB of GPU RAM. For optimal performance, tensor parallelism is recommended to distribute the processing load across multiple devices.
The model demonstrates impressive multilingual capabilities, supporting ten languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Russian, and Korean. This makes it particularly valuable for international applications and cross-language tasks.
Mistral Small v24.09 shows marked improvements over its predecessor (v24.02) in several key areas, including human alignment, reasoning capabilities, and code handling. These enhancements are documented in the release announcement, which includes comprehensive benchmark comparisons.
The model excels in various tasks, including:
For implementation, the recommended approach is using the vLLM library (version 0.6.1.post1 or later), which is designed for production-ready inference pipelines. The model can also be deployed using the mistral-inference
library or the Hugging Face transformers
library, offering flexibility in implementation approaches.
The model is distributed under the Mistral AI Research License (MRL), which restricts usage to non-commercial research purposes. Commercial applications require a separate license directly from Mistral AI. Developers and researchers can view the full license terms for detailed information about usage rights and restrictions.