Browse Models
The simplest way to self-host DeepSeek Coder V2. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
DeepSeek Coder V2 is a code-focused language model using Mixture-of-Experts architecture, available in 16B and 236B parameter versions. It handles 128K context length, supports 338 programming languages, and excels at code generation, debugging, and mathematical reasoning tasks. Trained on code (60%), math (10%), and text (30%).
DeepSeek Coder V2 represents a significant advancement in open-source code language models, built upon the innovative DeepSeekMoE framework. This Mixture-of-Experts (MoE) architecture model has been further pre-trained from an intermediate DeepSeek-V2 checkpoint using an additional 6 trillion tokens, marking a substantial improvement over its predecessor.
The model employs a sophisticated MoE architecture available in two variants:
The training data composition includes:
The training process utilized Next-Token-Prediction and Fill-In-Middle (FIM) objectives, with FIM enabled specifically for the 16B variant. The model underwent additional fine-tuning using an instruction training dataset and was aligned using Group Relative Policy Optimization (GRPO), incorporating compiler feedback and test cases for code-related tasks.
DeepSeek Coder V2 introduces several notable improvements over its predecessor, DeepSeek-Coder-33B:
Extended Language Support: Coverage expanded from 86 to 338 programming languages, as detailed in the supported languages list.
Increased Context Length: Context window expanded from 16K to 128K tokens, enabling handling of more complex coding tasks.
Enhanced Performance: The model demonstrates superior capabilities in:
According to the research paper, DeepSeek Coder V2 achieves remarkable performance across various benchmarks, often surpassing both open-source and closed-source competitors. The model demonstrates:
While the model shows particular strength in reasoning tasks, it exhibits a relative weakness in knowledge-intensive tasks compared to DeepSeek-V2. There remains a gap in instruction-following capabilities compared to state-of-the-art closed-source models, particularly in complex scenarios.
The model is released under permissive licensing terms, with the code available under the MIT License and the model under a separate Model License. Both research and commercial use are permitted.