Browse Models
The simplest way to self-host Qwen 2.5 72B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Qwen 2.5 72B is a large language model with 72.7B parameters and 128K token context window. It excels at long-form content generation, coding, and mathematical reasoning across 29+ languages. The model uses Grouped Query Attention with 80 layers and was trained on 18 trillion tokens, showing strong performance in knowledge-intensive tasks.
Qwen 2.5 72B represents a significant advancement in large language model capabilities, serving as the flagship model in the Qwen 2.5 family. As detailed in the official blog post, this decoder-only architecture comprises 72.7B parameters (70.0B non-embedding), making it one of the largest open-source language models available.
The model features an advanced transformer architecture incorporating RoPE, SwiGLU, and RMSNorm, along with specialized attention mechanisms. It utilizes 80 layers and implements a grouped-query attention (GQA) system with 64 attention heads for queries and 8 for keys and values. A standout technical achievement is its extensive context length support of 131,072 tokens (128K), enabling processing of very long documents and conversations.
The model was trained on an impressive dataset of up to 18 trillion tokens, as documented in the GitHub repository. This extensive training has enabled robust multilingual capabilities across more than 29 languages, making it a versatile tool for global applications.
Qwen 2.5 72B demonstrates several key improvements over its predecessor, Qwen 2:
The model shows competitive performance against other leading open-source models, including Llama-3.1-70B and Mistral-Large-V2. Perhaps most impressively, it demonstrates capabilities comparable to much larger models like Llama-3-405B, even in its base form without additional fine-tuning.
The Qwen 2.5 family includes several variants with different parameter sizes:
Each size comes in both base and instruction-tuned variants. While the 72B model represents the peak of performance, smaller variants like Qwen 2.5-3B have shown surprisingly strong capabilities, demonstrating improved knowledge density across the model family.
The base 72B model is not recommended for direct conversational applications without additional training. Post-training methods such as SFT, RLHF, or continued pretraining are suggested for conversational use cases. The model supports various deployment frameworks and quantization methods, including GPTQ and AWQ, to optimize performance and resource usage.
The 72B variant is released under a specific "qwen" license, while most other variants in the family (except 3B) are available under the Apache 2.0 license.