Browse Models
DeepSeek V3 is a 671-billion parameter Mixture-of-Experts language model that activates 37 billion parameters per token during inference. Developed by Deepseek AI and released in December 2024, it features Multi-head Latent Attention, auxiliary-loss-free load balancing, and FP8 mixed-precision training. The model was trained on 14.8 trillion tokens with extended context length support up to 128,000 tokens and demonstrates strong performance on coding, mathematical reasoning, and multilingual tasks.