Browse Models
The simplest way to self-host DeepSeek R1 Distill Qwen 32B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek-R1-Distill-Qwen-32B is a 32B parameter model that distills reasoning capabilities from a larger 671B MoE model. It uses a multi-stage training approach with cold-start data and reinforcement learning. Shows strong performance on mathematical and reasoning tasks with a 32K context window.
DeepSeek-R1-Distill-Qwen-32B is a distilled language model that represents a significant advancement in efficient AI model design. Built upon the Qwen2.5-32B base model, it inherits its core architecture while incorporating distilled reasoning capabilities from the larger DeepSeek-R1 model. With 32.8 billion parameters in BF16 format and a context length of 32,768 tokens, it achieves state-of-the-art results for dense models.
The model is part of the broader DeepSeek-R1 family, which employs a novel multi-stage training pipeline. As detailed in the DeepSeek R1 Paper, the original DeepSeek-R1 uses a Mixture-of-Experts (MoE) architecture with 37B activated parameters and 671B total parameters. The distillation process successfully transfers the reasoning capabilities from this larger model to the more efficient 32B parameter version.
The training process for DeepSeek-R1-Distill-Qwen-32B represents a significant improvement over its predecessor, DeepSeek-R1-Zero. While DeepSeek-R1-Zero relied solely on reinforcement learning (RL), the newer model incorporates a more comprehensive pipeline that includes cold-start data before RL. This approach helps improve reasoning performance while avoiding issues like repetition and poor readability that were present in the earlier model.
The model demonstrates exceptional performance across various benchmarks, surpassing OpenAI's o1-mini on multiple metrics. Notable achievements include strong performance on:
For optimal performance, DeepSeek-R1-Distill-Qwen-32B should be run with specific parameter settings:
The model can be deployed locally using frameworks like vLLM or SGLang. When working with mathematical problems, users should include specific directives in prompts, such as "put your final answer within \boxed". For reliable evaluation, it's recommended to average results across multiple runs.
The code and model weights are released under the MIT License, allowing for commercial use and derivative works. However, users must comply with the underlying base model licenses:
DeepSeek AI has released several distilled models in the R1 family, ranging from 1.5B to 70B parameters. These include:
Each variant offers different trade-offs between performance and resource requirements, with the 32B model representing a sweet spot for many applications.