Browse Models
The simplest way to self-host DeepSeek V3. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek-V3 is a 671B parameter MoE language model that activates 37B parameters per token. It features Multi-head Latent Attention, dynamic load balancing, and Multi-Token Prediction for faster inference. Trained on 14.8T tokens with a 128K context window, it excels at mathematics and coding tasks.
DeepSeek-V3 represents a significant advancement in large language model architecture, featuring 671B parameters with 37B parameters activated per token through its Mixture-of-Experts (MoE) design. Building upon its predecessor DeepSeek-V2, the model incorporates several innovative technologies detailed in the technical report.
The model's architecture combines Multiple-head Latent Attention (MLA) with the DeepSeekMoE framework, enabling efficient inference and cost-effective training. A standout innovation is its auxiliary-loss-free load balancing strategy, which dynamically adjusts expert bias terms to maintain balance without relying on auxiliary loss functions, thereby minimizing typical performance degradation associated with load balancing techniques.
DeepSeek-V3 introduces a Multi-Token Prediction (MTP) objective, enhancing both performance and enabling faster inference through speculative decoding. The model employs an FP8 mixed precision training framework with fine-grained tile-wise and block-wise quantization, optimizing training efficiency and reducing memory requirements. Additionally, it implements DualPipe, a novel pipeline parallelism algorithm that overlaps computation and communication phases for improved efficiency.
The model underwent extensive training on 14.8 trillion tokens, followed by Supervised Fine-Tuning and Reinforcement Learning phases. Notably, the entire training process required only 2.788M H800 GPU hours and demonstrated remarkable stability without experiencing irrecoverable loss spikes or requiring rollbacks, as detailed in the model documentation.
Performance benchmarks show DeepSeek-V3 outperforming other open-source models and achieving competitive results against leading closed-source models like GPT-4 and Claude-3.5. The model particularly excels in mathematics and coding tasks. It supports a substantial 128K context window length and demonstrates strong performance on the Needle In A Haystack (NIAH) test.
DeepSeek-V3 is available in both Base and Chat variants, with the complete model weighing 685GB (including 671B main model weights and 14GB MTP module weights). The model can be run locally through various frameworks:
The model weights are provided in FP8 format, with a conversion script available for BF16 conversion. While Hugging Face's Transformers library support is still in development, multiple optimized implementations exist for various hardware platforms, including AMD GPUs and Huawei Ascend NPUs.
The code is released under the MIT License, while the model itself is governed by a separate Model License that permits commercial use. The complete model implementation and documentation are available through the DeepSeek-V3 repository.
Reference Links: