Browse Models
The simplest way to self-host Qwen 2 72B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Qwen-2 72B is a large language model featuring a 128K token context window and Group Query Attention architecture. It excels in multilingual tasks across 27 languages and demonstrates strong performance in reasoning, coding, and mathematics benchmarks. Notable for its Online Merging Optimizer and efficient training methodology.
The Qwen-2 72B represents a significant advancement in language model architecture, employing a dense Transformer design that incorporates several modern improvements. The model features SwiGLU activation, attention QKV bias, and Group Query Attention (GQA), with the latter specifically implemented to enhance inference speed and reduce memory consumption. This architectural approach marks a departure from its predecessor, Qwen 1.5, demonstrating the evolution of the model family's design philosophy.
The model boasts an impressive context length of 128K tokens in its instruction-tuned variant (Qwen2-72B-Instruct). This extended context handling is achieved through advanced techniques such as YARN and Dual Chunk Attention. The model also features an improved tokenizer designed to handle multiple natural languages and code effectively, enhancing its versatility across different types of content.
Qwen-2 72B's training process incorporated data spanning 27 languages beyond English and Chinese, with particular attention paid to improving code-switching capabilities. The training methodology included several sophisticated approaches, including supervised fine-tuning, reward model training, and online DPO training. A notable innovation in the training process was the implementation of the Online Merging Optimizer, which was specifically designed to minimize alignment tax.
The model is not recommended for direct text generation from its base form. Instead, users are advised to employ post-training techniques such as SFT, RLHF, or continued pre-training to optimize the model for specific applications. This approach ensures better performance and more controlled output for specific use cases.
Qwen-2 72B has demonstrated exceptional performance across a wide range of benchmarks, often surpassing both open-source and proprietary competitors. The model has been extensively tested across various domains:
The model has shown particularly strong results compared to its predecessor, Qwen1.5-110B, and has demonstrated competitive performance against other leading models like Llama-3-70B. These improvements are attributed to enhanced dataset quality and optimized training methodologies.
The Qwen-2 family includes several models of varying sizes:
While all models in the family share core architectural elements like GQA, the smaller variants utilize additional optimizations such as tied embeddings. The 72B model represents the largest and most capable version in the family, though each variant has been optimized for different use cases and computational requirements.
The Qwen-2 72B model is released under the "tongyi-qianwen" license, while smaller models in the family use the Apache 2.0 license. The model is available through Hugging Face and requires Transformers library version 4.37.0 or later for optimal functionality.