Browse Models

Alibaba Cloud /

Qwen 2 72B

Family

Qwen 2

Type

Foundation Model

License

Tongyi Qianwen License

Released

2024-06-07

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Qwen 2 72B using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

Alibaba Cloud / Qwen 2 72B

Qwen2-72B is a 72.71 billion parameter Transformer-based language model developed by Alibaba Cloud, featuring Group Query Attention and SwiGLU activation functions. The model demonstrates strong performance across diverse benchmarks including MMLU (84.2), HumanEval (64.6), and GSM8K (89.5), with multilingual capabilities spanning 27 languages and extended context handling up to 128,000 tokens for specialized applications.

Explore the Future of AI

Your server, your data, under your control

Header image displaying the stylized Qwen2 logo, representing the Qwen2 series of large language models.

Full Size Image Image Source

Qwen2-72B is a large-scale language model introduced as part of the Qwen2 series, released on June 7, 2024, as an evolution of the earlier Qwen1.5 models. Developed by the Qwen team, Qwen2-72B is designed as a foundational language model positioned for further post-training, including supervised fine-tuning and reinforcement learning from human feedback. The Qwen2 series encompasses models of varying sizes, with both base and instruction-tuned variants. Qwen2-72B, the largest dense model in the lineup, demonstrates strong performance across a diverse range of benchmarks, multilingual tasks, and extended context scenarios, reflecting ongoing advancements in large language model research and engineering. For further details, refer to the official Qwen2 release announcement.

Architecture and Model Design

Qwen2-72B is built upon the Transformer architecture, incorporating several enhancements such as SwiGLU activation, attention QKV bias, and Group Query Attention (GQA). The use of GQA contributes to improved inference speed and reduced memory consumption. This dense model comprises approximately 72.71 billion parameters, of which 70.21 billion are non-embedding weights. Unlike some smaller Qwen2 variants, Qwen2-72B does not employ embedding tying. An updated tokenizer is integrated to support multiple natural languages and programming code, reflecting efforts to enhance multilingual and code understanding.

The Qwen2 model family includes both base and instruction-tuned models. Qwen2-72B itself is a dense, decoder-only language model, differentiating it from Mixture-of-Experts (MoE) architectures found in related models like Qwen2-57B-A14B. All models in the series utilize GQA, but only the smallest models (Qwen2-0.5B and Qwen2-1.5B) integrate tied embeddings.

Training Data and Methods

Qwen2-72B and its instruction-tuned derivative, Qwen2-72B-Instruct, were trained on large, diverse datasets covering 27 languages beyond English and Chinese. Considerable emphasis was placed on increasing both the quality and volume of multilingual and instructional training data. The training regimen addressed linguistic challenges such as code-switching, leading to significant improvements in multilingual evaluation metrics.

For instruction tuning, models such as Qwen2-72B-Instruct underwent post-training to enhance human alignment and task-following capabilities. Alignment techniques included scalable supervised fine-tuning, reward modeling, and online Direct Preference Optimization (DPO). Automated strategies such as rejection sampling for mathematics, execution feedback for coding, back-translation for creative writing, and scalable oversight for role-play were employed to generate high-quality instructional data. An Online Merging Optimizer was implemented to reduce the so-called "alignment tax" during post-training.

Benchmark Performance

Performance comparison table of Qwen2-72B versus leading models across benchmarks

Qwen2-72B exhibits robust performance across a broad array of benchmarks. Evaluations on standard datasets demonstrate competitive or superior results to contemporary models such as Llama-3-70B and Mixtral-8x22B, as well as achieving notable gains over its predecessor, Qwen1.5-110B, despite a smaller parameter count. Key benchmark results include high scores on MMLU (84.2), HumanEval (64.6), GSM8K (89.5), C-Eval (91.0), and Multi-Mathematics (76.0). These figures represent the model's capabilities in tasks ranging from general language understanding and mathematics to code generation.

The instruction-tuned variant, Qwen2-72B-Instruct, demonstrates further improvements in alignment and applied task performance.

Table comparing instruction-tuned model benchmark scores

On popular benchmarks, Qwen2-72B-Instruct attains scores such as 82.3 on MMLU, 86.0 on HumanEval, and 91.1 on GSM8K, closely matching or outperforming other leading models tested at similar scales.

Further analysis of domain-specific capabilities shows Qwen2-72B-Instruct possessing strong coding and mathematical proficiency across programming languages and mathematical benchmarks.

Radar chart for coding and bar chart for mathematics comparing Qwen2-72B-Instruct and Llama3-70B-Instruct

In addition to high raw scores, Qwen2-72B-Instruct demonstrates strong multilingual and context-handling capabilities. Human evaluation studies and benchmarking on M-MMLU and MGSM further confirm robust performance in language understanding and mathematical reasoning across multiple languages.

Extended Context and Long-Range Understanding

A distinctive feature of the Qwen2 series, including Qwen2-72B-Instruct, is its capacity to process exceptionally long contexts. Models are trained on contexts up to 32,000 tokens and are evaluated at lengths as high as 128,000 tokens via techniques such as YARN and Dual Chunk Attention.

Heatmaps for Needle in a Haystack showing Qwen2 models' retrieval accuracy at long context lengths

Heatmap-based evaluation from the Needle in a Haystack benchmark demonstrates that Qwen2-72B-Instruct maintains perfect retrieval accuracy for factual information even at 128,000 tokens, indicating robust long-range memory and information extraction. This positions the model for use cases demanding precise recall across extensive documents.

Applications, Limitations, and Model Family

Qwen2-72B serves primarily as a base model for specialized post-training, including instructional fine-tuning and reinforcement learning workflows. The instruction-tuned Qwen2-72B-Instruct displays aptitude for tasks in reasoning, code generation, mathematics, multilingual understanding, and information extraction from long texts. As a general-purpose model, it can be applied to language generation, code synthesis, educational tasks, and research scenarios where large-context comprehension and multilingual support are required.

Within the Qwen2 family, models range in scale from 0.5B to 72B parameters, encompassing both dense and MoE architectures. Context length capabilities vary from 32K to 128K tokens for instruction-tuned versions, depending on model size.

The primary limitation of Qwen2-72B is that its base form is intended for further post-training and not optimized for direct deployment in text generation applications without additional alignment. Licensing for Qwen2-72B is governed by the Qianwen License, while other family members typically utilize the Apache 2.0 open-source license.

Release and Further Resources

Qwen2 was officially announced on June 7, 2024. Ongoing updates, including derivative models such as Qwen2.5-72B, are catalogued on the project's GitHub repository and through community platforms. Usage instructions and technical documentation are available via the official Hugging Face model card and Qwen documentation portal.

Qwen 2 72B

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Qwen 2 72B

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Architecture and Model Design

Training Data and Methods

Benchmark Performance

Extended Context and Long-Range Understanding

Applications, Limitations, and Model Family

Release and Further Resources

Helpful Links