Browse Models
The simplest way to self-host Xwin 70B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Xwin-LM-70B is a Llama 2-based model that combines multiple alignment techniques including RLHF, supervised fine-tuning, and reject sampling. Notable for surpassing GPT-4 on the AlpacaEval benchmark with a 60.61% win rate, it shows strong performance on reasoning and comprehension tasks.
Xwin-LM-70B-V0.1 represents a significant advancement in large language model development, built upon the foundation of Llama 2 base models. This model demonstrates exceptional performance and introduces several innovative approaches to model alignment and optimization.
The model's architecture leverages multiple sophisticated LLM alignment technologies, including supervised fine-tuning (SFT), reward models (RM), reject sampling, and reinforcement learning from human feedback (RLHF). This comprehensive approach to model alignment has contributed to its remarkable performance achievements.
A distinguishing characteristic of Xwin-LM-70B-V0.1 is its groundbreaking performance on the AlpacaEval benchmark, where it achieved unprecedented results. The model secured a 95.57% win rate against Text-Davinci-003 and, most notably, a 60.61% win rate against GPT-4. This achievement marks a significant milestone as the first model to surpass GPT-4 on this particular benchmark.
The Xwin-LM family includes three main variants:
While all models in the family demonstrate strong performance, benchmarking results consistently show that the 70B variant leads in overall capabilities. The smaller models, despite their reduced parameter counts, still achieve impressive rankings among similarly-sized models on the AlpacaEval benchmark. This performance scaling across model sizes demonstrates the effectiveness of the Xwin-LM architecture and training approach.
The model family has been evaluated across various NLP foundation tasks, including MMLU, ARC, TruthfulQA, and HellaSwag, with the 70B variant consistently showing superior performance compared to its smaller counterparts.
For practical implementation, Xwin-LM-70B-V0.1 supports inference through multiple channels, including the Hugging Face Transformers library and the vllm library for efficient processing. The model follows the Llama 2 license terms, ensuring proper usage and distribution guidelines.
The development team has indicated plans for future releases that will expand the model's capabilities, with a particular focus on enhancing mathematical and reasoning skills. This ongoing development suggests a commitment to continuous improvement and refinement of the model's capabilities.