Browse Models
The simplest way to self-host DeepSeek R1 Distill Llama 8B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek R1 Distill Llama 8B is an 8 billion parameter model distilled from a larger 671B MoE architecture. It inherits strong reasoning capabilities from its parent model while being more computationally efficient. Performs well in math, reasoning, and code tasks with recommended temperature 0.5-0.7.
DeepSeek-R1-Distill-Llama-8B represents a significant advancement in efficient language model design, offering strong reasoning capabilities in a relatively compact 8B parameter architecture. This model is part of the broader DeepSeek-R1 family, which includes various distilled versions ranging from 1.5B to 70B parameters based on both Llama and Qwen architectures.
The model is a distilled version derived from the much larger DeepSeek-R1, which features a Mixture-of-Experts (MoE) architecture with 671B total parameters (37B activated). The development process involved multiple stages, beginning with the creation of DeepSeek-R1-Zero, which was trained purely through reinforcement learning. While DeepSeek-R1-Zero showed promising reasoning capabilities, it suffered from issues like repetition and poor readability.
To address these limitations, researchers developed DeepSeek-R1 using a novel approach that incorporated a cold-start data phase before reinforcement learning. This improved model was then distilled into smaller, more efficient versions, including the 8B parameter variant built on the Llama-3.1-8B base model.
DeepSeek-R1-Distill-Llama-8B demonstrates impressive performance across various benchmarks, particularly in reasoning, mathematics, and code-related tasks. The model shows comparable results to OpenAI's o1 model in several areas, though specific performance metrics vary by task.
The model performs optimally with specific parameter settings:
These settings help mitigate potential issues such as infinite loops or incoherent outputs while maintaining high-quality reasoning capabilities.
The DeepSeek-R1 distilled family includes several variants:
Each variant leverages either the Qwen or Llama architecture as its base model, with the 8B Llama variant offering a balanced compromise between performance and computational efficiency. The entire family demonstrates that smaller models can achieve high reasoning capabilities when trained with knowledge distilled from more powerful predecessors.
The model is open-sourced under multiple licenses:
The model can be accessed through Hugging Face and run locally using tools like vLLM.