DeepSeek R1 Distill Qwen 32B

DeepSeek R1 Distill Qwen 32B is a text generation model developed by DeepSeek-AI through knowledge distillation from the larger DeepSeek-R1 teacher model. Built on the Qwen2.5-32B architecture, it utilizes reinforcement learning and distillation techniques to enhance reasoning capabilities across mathematical problems, code generation, and cognitive tasks, demonstrating strong performance on benchmarks like AIME 2024 and MATH-500.

Model Architecture and Training Process

DeepSeek R1 Distill Qwen 32B inherits its dense architecture from Qwen2.5-32B, further refined through a targeted distillation strategy. The teacher model, DeepSeek-R1, is a Mixture-of-Experts (MoE) language model based on the DeepSeek-V3-Base architecture. DeepSeek-R1 features 671 billion total parameters, with 37 billion activated per forward pass.

The development of DeepSeek-R1 and its distilled derivatives leverages a two-tiered RL methodology: an initial RL stage to establish complex reasoning behaviors, and a subsequent RL phase to align outputs with human preferences of helpfulness and harmlessness. Prior to these RL stages, supervised fine-tuning (SFT) is employed to "seed" basic reasoning and general language skills. The distillation step for models such as DeepSeek R1 Distill Qwen 32B uses approximately 800,000 curated samples generated by the DeepSeek-R1 teacher, fine-tuning the target model for advanced reasoning performance.

Data Sources and Training Methodology

Training data for DeepSeek R1 Distill Qwen 32B is generated via the DeepSeek-R1 model and comprises two principal categories. The first and largest portion consists of around 600,000 samples oriented toward reasoning-intensive challenges—such as mathematics, coding, science, and logic—curated using rule-based and generative reward mechanisms. The second portion includes approximately 200,000 general-purpose samples, covering writing, factual question answering, translation, and self-cognition, with a subset explicitly leveraging Chain-of-Thought (CoT) reasoning.

The upstream DeepSeek-R1 teacher model is trained via a staged process. It begins with DeepSeek-R1-Zero, which employs RL using Group Relative Policy Optimization (GRPO) without any initial SFT. This yields emergent reasoning capabilities but introduces challenges, such as repetition and mixed-language responses. The subsequent DeepSeek-R1 stage improves on this by pre-conditioning with a small volume of carefully-selected CoT data ("cold start"), enhancing readability and language consistency. The final distilled datasets, produced after RL convergence, are curated with rejection sampling and SFT to exclude undesirable attributes (e.g., mixed language or excessive length), ensuring high-quality training inputs for downstream distillation, as detailed in the DeepSeek-R1 research paper.

Performance and Benchmark Evaluation

DeepSeek R1 Distill Qwen 32B demonstrates robust performance on a diverse set of evaluation benchmarks in mathematical reasoning, code generation, and general knowledge tasks. On tasks like AIME 2024, MATH-500, and LiveCodeBench, it outperforms several competing dense models and closely follows or, in some cases, surpasses models that are considerably larger or trained with different methodologies.

Across key benchmarks, DeepSeek R1 Distill Qwen 32B has been shown to:

Achieve a Pass@1 score of 72.6 on AIME 2024, outperforming both QwQ-32B-Preview and o1-mini.

Reach a Pass@1 of 94.3 on MATH-500 and 57.2 on LiveCodeBench, reflecting strong mathematical and coding abilities.

Excel on GPQA Diamond and Codeforces, ranking among the top performers for general knowledge and programming competition proficiency.

The performance gains observed with DeepSeek R1 Distill Qwen 32B are attributed to its distillation from the large-scale RL-trained DeepSeek-R1 model, rather than relying solely on direct RL training at the smaller scale. This approach enables efficient transfer of the teacher model's reasoning strategies into a more compact and versatile system, as highlighted across multiple technical reports and benchmarks.

Applications and Use Cases

DeepSeek R1 Distill Qwen 32B is particularly well-suited for scenarios demanding intricate reasoning, robust problem-solving, and high-level cognitive tasks. Typical applications include advanced mathematics problem solving (as evaluated by AIME and MATH-500 benchmarks), code synthesis and debugging (as reflected in LiveCodeBench and Codeforces performance), and general knowledge reasoning (as assessed by GPQA Diamond).

Given its proficiency in Chain-of-Thought style tasks, the model finds further application in question answering, technical and scientific writing, summarization, and scenarios where explanations and stepwise logic are critical. The broader DeepSeek-R1 family, from which this model is derived, is also used for creative writing, editing, and tasks requiring extended context handling, as described in the DeepSeek-R1 research documentation.

Limitations

While DeepSeek R1 Distill Qwen 32B demonstrates substantial capabilities, certain limitations are documented. The model is optimized for Chinese and English, which can result in language mixing or reduced performance when queried in other languages. Its outputs are prompt-sensitive, and few-shot prompts may lead to diminished reasoning ability; zero-shot querying is generally recommended. The model performs best when instructions are provided directly in the user prompt, without using a separate system prompt.

In comparison to earlier models such as DeepSeek-V3, DeepSeek-R1 derivatives may be less capable in some complex structured tasks, including function calling or multi-turn dialogues. Additionally, in domain-specific software engineering benchmarks, improvements over previous generations are limited by the current scope of RL training data.

Licensing

DeepSeek R1 Distill Qwen 32B is released under the MIT License, which grants broad rights for use, modification, and distribution, including in commercial settings. The underlying Qwen2.5-32B base model is distributed under the Apache 2.0 License, and users should be mindful of corresponding terms and attribution requirements. Models distilled from Llama series inherit their respective Llama-3.1 or Llama-3.3 licenses.

DeepSeek R1 Distill Qwen 32B

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

DeepSeek R1 Distill Qwen 32B

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Model Architecture and Training Process

Data Sources and Training Methodology

Performance and Benchmark Evaluation

Applications and Use Cases

Limitations

Licensing

Helpful Links