Browse Models
The simplest way to self-host Qwen 2.5 Coder 32B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Qwen 2.5 Coder 32B is a 32.5B parameter code-focused language model trained on 5.5T tokens across 92 programming languages. It features a 128K token context window and uses RoPE, SwiGLU, and YaRN architecture. Notable for strong code generation, debugging, and maintenance capabilities compared to smaller models in its family.
Qwen 2.5 Coder 32B represents the largest and most capable model in the Qwen 2.5 Coder family, a series of specialized coding language models developed by Alibaba Cloud. Released in September 2024, this state-of-the-art model demonstrates coding capabilities comparable to GPT-4o while maintaining strong performance in mathematical and general reasoning tasks.
The model features a decoder-only transformer architecture incorporating several key components including RoPE, SwiGLU, RMSNorm, and Attention QKV bias. With 32.5 billion parameters (31.0B non-embedding) spread across 64 layers, it represents a significant scaling up from its smaller siblings in the model family, which range from 0.5B to 14B parameters.
A notable technical achievement is the model's support for long-context processing of up to 128K tokens, though the default configuration is set to 32,768 tokens. For processing beyond the default token limit, the model can utilize YaRN (Yet Another Recursive Network) for efficient context window extension, though this may impact performance on shorter sequences.
The model underwent extensive training on approximately 5.5 trillion tokens, encompassing three main categories:
This comprehensive training approach has resulted in significant improvements over its predecessor, CodeQwen 1.5, particularly in:
The model supports an impressive 92 programming languages, making it highly versatile for various development scenarios. As detailed in the Qwen 2.5 Coder technical report, the model achieves state-of-the-art performance among open-source code LLMs.
The Qwen 2.5 Coder family includes six models with varying parameter sizes:
Each model is available in both base and instruction-tuned variants, with the latter indicated by the "-Instruct" suffix. The instruction-tuned versions demonstrate improved performance in multi-programming scenarios (across 40+ languages) and enhanced code reasoning capabilities as measured by CRUXEval.
For deployment flexibility, several quantized versions are available, including GPTQ-Int4, GPTQ-Int8, AWQ, and GGUF, making the model accessible on resource-constrained devices while maintaining reasonable performance.
The model requires Hugging Face's transformers
library (version 4.37.0 or later) for implementation. While the base model is primarily designed for code generation tasks, the documentation advises using instruction-tuned variants for conversational applications.
The model is released under the Apache 2.0 license, making it accessible for both research and commercial applications.