Browse Models

Mistral AI /

Mistral Small 3 (2501)

Family

Mistral

Type

Foundation Model

License

Apache-2.0 License

Released

2024-01-30

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Mistral Small 3 (2501) using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

Mistral AI / Mistral Small 3 (2501)

Mistral Small 3 (2501) is a 24-billion-parameter instruction-fine-tuned language model developed by Mistral AI and released under an Apache 2.0 license. The model features a 32,000-token context window, multilingual capabilities across eleven languages, and demonstrates competitive performance on benchmarks including MMLU Pro, HumanEval, and instruction-following tasks while maintaining efficient inference speeds.

Explore the Future of AI

Your server, your data, under your control

Mistral Small 3 (2501) is a 24-billion-parameter instruction-fine-tuned large language model (LLM) developed by Mistral AI. Released on January 30, 2025, Mistral Small 3 is positioned in the "small" LLM category (models with fewer than 70 billion parameters), demonstrating instruction-following performance, low latency, and competitive benchmark results across a variety of natural language tasks. The model is built upon the Mistral-Small-24B-Base-2501 foundation, emphasizing efficiency and versatility while adopting an Apache 2.0 license for broad accessibility.

Scatter plot chart comparing Mistral Small 3's performance and latency to other models.

Model Architecture and Training

Mistral Small 3 comprises 24 billion parameters and utilizes an architecture explicitly designed for high efficiency and responsiveness. It features a streamlined configuration with fewer layers than many competing models, which can reduce time per forward pass. The model offers a 32,000-token context window, allowing it to effectively handle extended conversations or analyses. Multilingual capabilities are intrinsic to the design; Mistral Small 3 can process and generate text in languages including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.

Instruction fine-tuning is a core aspect of this model, with the training pipeline avoiding both reinforcement learning and the use of synthetic data, as documented in the official announcement. The tokenizer deployed is the Tekken tokenizer, which leverages a 131,000-token vocabulary for effective text processing. The training corpus covers data up to October 2023, with subsequent instruction tuning designed to maximize generalist performance.

Bar chart illustrating pretraining accuracy of several models across various tasks.

Performance and Benchmarking

Mistral Small 3 demonstrates competitive performance according to both human evaluations and standardized public benchmarks. In human preference assessments conducted with external third-party vendors across 1,000+ prompts, responses from Mistral Small 3 were rated as preferred to those from other models, including Gemma-2 27B, Qwen-2.5 32B, Llama-3.3 70B, and GPT-4o-mini.

Stacked bar chart displaying human rater preferences for Mistral Small 3 versus other models.

Quantitative benchmarks further support these findings. On the MMLU Pro (5-shot) and GPQA main benchmarks, Mistral Small 3 achieves scores comparable to or exceeding other models in its category:

Bar chart comparing general knowledge and reasoning benchmark results.

For code generation and mathematical reasoning, Mistral Small 3 demonstrates competitive results in HumanEval and Math Instruct evaluations:

Bar chart showing accuracy on code and math benchmarks.

Instruction-following accuracy, a characteristic of Mistral Small 3, is illustrated by its results on Wildbench, Arena Hard, MTBench, and IFEval:

Bar chart of instruction-following accuracy for Mistral Small 3 and other models.

Additionally, multilingual capabilities are highlighted in evaluations on language-specific MMLU benchmarks:

Bar chart comparing MMLU benchmark scores in several languages.

Efficiency and Deployment

Mistral Small 3 was optimized for low-latency inference, capable of producing up to 150 tokens per second. Its efficient architecture enables compatibility with a variety of deployment scenarios, including local and edge environments. When quantized, the model can be operated on single commercial GPUs or consumer hardware, making it suitable for privacy-sensitive and latency-critical applications. The model conforms to the V7-Tekken instruct template, supporting advanced system prompting and consistent instruction following.

The recommended inference temperature is 0.15 for production settings. System prompts can be formatted to ensure the model reliably disclaims uncertainty and requests clarification from users when required. For practical deployment, frameworks such as vLLM and Transformers are supported, as well as compatibility with Ollama for local inference.

Applications and Use Cases

Mistral Small 3 addresses a broad range of generative AI applications. Its capabilities including rapid instruction following, conversational engagement, and agentic workflows facilitate deployment as a foundation for virtual assistants, subject matter specialists through domain-specific fine-tuning, and automated function-calling agents within complex systems. The capacity for local inference supports scenarios with heightened privacy requirements, such as internal organizational workflows or industries handling sensitive data, including finance, healthcare, manufacturing, and customer service.

Limitations

While Mistral Small 3 provides generalist performance, it is not enhanced with reinforcement learning from human feedback (RLHF) or synthetic data, as noted in Mistral's technical release. This design choice positions the model in a specific segment of the generative model pipeline, focusing on reproducibility and interpretability.

Licensing and Model Access

Mistral Small 3 is distributed under the Apache 2.0 License, enabling both commercial and non-commercial modification and usage. This licensing paradigm aligns with Mistral AI’s stated intention to contribute general-purpose models to the open-source community. The model and associated resources are available via Mistral’s public repositories and documentation.

Mistral Small 3 (2501)

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Mistral Small 3 (2501)

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Model Architecture and Training

Performance and Benchmarking

Efficiency and Deployment

Applications and Use Cases

Limitations

Licensing and Model Access

Helpful External Links