Browse Models
DeepSeek V3 comprises large-scale open-weights language models developed by DeepSeek AI, featuring Mixture-of-Experts architecture with 671 billion total parameters but only 37 billion activated per token. The family utilizes Multi-head Latent Attention, FP8 mixed-precision training, and supports 128K token context length for tasks including reasoning, coding, mathematics, and multilingual understanding across research and commercial applications.