Browse Models
The simplest way to self-host DeepSeek R1 Distill Llama 70B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek R1 Distill Llama 70B is a language model trained through a hybrid approach combining reinforcement learning with supervised fine-tuning. It excels in mathematical reasoning (79.8% on AIME) and coding tasks (96.3rd percentile on Codeforces). The model uses Group Relative Policy Optimization for efficient training without a critic model.
The DeepSeek R1 Distill Llama 70B represents a significant achievement in distilling advanced reasoning capabilities from larger language models into more efficient architectures. Built upon the Llama architecture, this model is part of the broader DeepSeek R1 family, which includes multiple variants ranging from 1.5B to 70B parameters based on both the Qwen2.5 and Llama3 series.
The model emerges from an innovative training pipeline that combines reinforcement learning (RL) and supervised fine-tuning (SFT). Unlike its predecessor DeepSeek-R1-Zero, which relied solely on RL training, the R1 series incorporates cold-start data before the RL phase, as detailed in the DeepSeek R1 paper. This approach significantly improves issues like repetition and readability that were present in earlier versions.
The training process involves multiple stages:
The architecture employs Group Relative Policy Optimization (GRPO) for reinforcement learning, eliminating the need for a separate critic model. The reward modeling system incorporates both accuracy and format rewards, with training templates guiding the model to produce reasoning processes within specific tags.
The DeepSeek R1 Distill Llama 70B demonstrates remarkable performance across various benchmarks, achieving results comparable to OpenAI's o1-1217 model. Notable achievements include:
The model shows particular strength in reasoning tasks, mathematical problem-solving, and coding challenges. It also demonstrates improved performance on knowledge benchmarks (MMLU, MMLU-Pro, GPQA Diamond) and long-context tasks (FRAMES) compared to its predecessors.
For optimal performance, DeepSeek recommends:
The model can be deployed locally, with vLLM recommended as the serving framework. Local deployment requires consideration of computational resources given the model's 70B parameter size.
The model operates under a multi-tiered licensing structure:
Users should carefully review the licensing terms, particularly when considering modifications or derivative works, as the base Llama model's license terms apply to this distilled version.