Browse Models
The simplest way to self-host DeepSeek R1 Distill Qwen 14B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek-R1-Distill-Qwen-14B is a 14.8B parameter model distilled from the larger DeepSeek-R1. Using chain-of-thought training and reinforcement learning, it preserves strong mathematical reasoning capabilities while reducing size. Shows notable results in math and coding benchmarks.
DeepSeek-R1-Distill-Qwen-14B is a 14.8B parameter causal language model that represents a significant achievement in model distillation technology. The model is derived from the larger DeepSeek-R1 model, which itself builds upon the DeepSeek-V3-Base architecture using a sophisticated development pipeline incorporating both reinforcement learning (RL) and supervised fine-tuning (SFT).
The model utilizes the Qwen2ForCausalLM architecture and is fully compatible with the Hugging Face Transformers library. It is distributed using the Safetensors format for improved security and performance. As detailed in the research paper, the distillation process successfully transfers the reasoning capabilities of the much larger parent model while maintaining a more practical model size.
The model's training process leveraged the reasoning patterns learned by DeepSeek-R1 through its novel two-stage RL and two-stage SFT pipeline. The training approach first established a foundation using a "cold start" phase with thousands of Chain-of-Thought (CoT) examples, followed by reasoning-oriented RL and supervised fine-tuning using rejection sampling. This sophisticated training process helped develop strong capabilities across reasoning, mathematics, and coding tasks.
DeepSeek-R1-Distill-Qwen-14B demonstrates impressive performance across various benchmarks:
For optimal performance, the model developers recommend using a temperature setting between 0.5 and 0.7 during inference to prevent issues like repetition or incoherent outputs. When working with mathematical problems, using the directive "put your final answer within \boxed" is recommended for clearer results.
DeepSeek-R1-Distill-Qwen-14B is part of a larger family of distilled models, which includes variants with 1.5B, 7B, 8B, 32B, and 70B parameters. Each variant uses either Qwen2.5 or Llama3 series as base models. Notable among these is the DeepSeek-R1-Distill-Qwen-32B, which achieves superior performance to OpenAI's o1-mini across several benchmarks.
The full family of distilled models includes:
The model is released under the MIT License, which permits commercial use, modifications, and derivative works. However, users must be aware of and comply with the licenses of the base models used in distillation: Qwen2.5 models (Apache 2.0 License) and Llama models (Llama 3.1/3.3 license).
The model can be implemented locally using tools such as vLLM for efficient serving and inference. All model weights and implementation details are available through the Hugging Face repository.