Browse Models
The simplest way to self-host DeepSeek R1 Distill Qwen 1.5B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek R1 Distill Qwen 1.5B is a distilled version of the 671B DeepSeek R1 model, optimized for mathematical reasoning and coding tasks. Built on Qwen2.5-Math architecture, it uses combined RL and supervised fine-tuning to maintain core capabilities while reducing parameters to 1.5B.
DeepSeek R1 Distill Qwen 1.5B represents a significant achievement in model compression and efficient AI design, offering powerful reasoning capabilities in a compact form factor. This distilled model is part of the broader DeepSeek R1 family, which includes six distilled variants ranging from 1.5B to 70B parameters based on the Qwen and Llama architectures.
The model is built upon the Qwen2.5-Math-1.5B architecture and is derived from the much larger DeepSeek R1 model, which features a Mixture-of-Experts (MoE) architecture with 671 billion total parameters (37 billion activated). The development process involved a sophisticated training pipeline that combined reinforcement learning (RL) with supervised fine-tuning (SFT), as detailed in the DeepSeek R1 paper.
The training methodology evolved from the earlier DeepSeek-R1-Zero model, which relied solely on RL. The improved approach incorporated cold-start data before the RL stages, which helped mitigate issues like repetition and poor readability while maintaining strong reasoning capabilities. This refined process has enabled the distilled models, including the 1.5B variant, to achieve state-of-the-art results for dense models of their size.
DeepSeek R1 Distill Qwen 1.5B demonstrates impressive performance across various benchmarks, particularly in mathematics, coding, and reasoning tasks. While specific benchmark numbers for the 1.5B model aren't provided, the DeepSeek R1 family's performance is notable, with the larger DeepSeek R1 achieving results comparable to OpenAI's o1 model.
For context, the full-size DeepSeek R1 model achieves:
The distilled models, including the 1.5B variant, retain significant portions of these capabilities while requiring far fewer computational resources. For instance, the 7B distilled version achieves 55.5% on AIME 2024, surpassing the larger QwQ-32B-Preview model's 50% score, demonstrating the effectiveness of the distillation process.
The model can be run locally using vLLM with the following recommended command:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
For optimal results, users should:
The model uses safetensors with BF16 parameters, totaling 1,777,088,000 parameters.
The code for DeepSeek R1 Distill Qwen 1.5B is licensed under the MIT License, allowing for commercial use and derivative works. However, it's important to note that the base models used for distillation are subject to their respective licenses: