Launch a dedicated cloud GPU server running Laboratory OS to download and run Nous Hermes Mixtral 8X7B DPO using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.
The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.
Model Report
NousResearch / Nous Hermes Mixtral 8X7B DPO
Nous Hermes Mixtral 8X7B DPO is a large language model developed by NousResearch using the Mixtral 8x7B Mixture of Experts architecture with approximately 46.7 billion parameters. The model combines supervised fine-tuning and direct preference optimization (DPO) training on over one million entries of GPT-4-generated and open-source data. It demonstrates capabilities in code generation, creative writing, and conversational tasks while supporting ChatML prompt formatting and various quantized deployment options.
Explore the Future of AI
Your server, your data, under your control
Nous Hermes 2 Mixtral 8x7B DPO is a large language model developed by Nous Research, leveraging the Mixture of Experts (MoE) architecture known as Mixtral 8x7B. This model integrates supervised fine-tuning and direct preference optimization (DPO) to enhance its reasoning, text generation, and dialogue capabilities. Built upon a substantial dataset composed mainly of GPT-4-generated and high-quality open-source data, the model exhibits performance across diverse benchmarks and tasks, positioning it within the Mixtral family of models.
Model Architecture and Training
The foundation of Nous Hermes 2 Mixtral 8x7B DPO is the Mixtral 8x7B MoE large language model, which utilizes a Mixture of Experts approach to dynamically activate subsets of its network for different tasks. This architecture consists of approximately 46.7 billion parameters, partitioned to optimize computational efficiency and capacity for complex reasoning.
For training, Nous Hermes 2 Mixtral 8x7B DPO was exposed to over one million entries, with a focus on maximizing the diversity and quality of its learning materials. The data mix includes substantial synthetic data generated by GPT-4, supplemented with verified open datasets. Two principal training strategies were employed: supervised fine-tuning (SFT), which aligns the model with high-quality human-like responses, and DPO, a reinforcement learning technique that utilizes preference data to guide the model. This dual-phase approach yielded the SFT+DPO version, and a comparative SFT-only model was also released.
Benchmark Performance
Evaluation results indicate that Nous Hermes 2 Mixtral 8x7B DPO exhibits specific performance metrics on established benchmarks. On the GPT4All suite, the model achieves an average score of 75.70, with specific results on tasks such as ARC Challenge (accuracy: 0.5990) and BoolQ (accuracy: 0.8783), as detailed on the model's Hugging Face repository.
In the AGIEval tests, which emphasize logical reasoning and academic aptitude, the DPO model reports an average accuracy of 46.05. BigBench, a comprehensive reasoning suite, yields an average score of 49.70. Comparative analysis against the Mixtral 8x7B model indicates that the SFT+DPO variant of Nous Hermes 2 yields specific performance differences in overall metrics and specific tasks such as MMLU and ARC.
BigBench benchmark chart highlighting the performance of Nous Hermes 2 Mixtral 8x7B DPO and related models across reasoning tasks.
Performance comparison between Nous Hermes 2 Mixtral 8x7B DPO and Mixtral-8x7B-Instruct-v0.1 across multiple benchmarks, indicating performance differences of the DPO model.
The Nous Hermes 2 Mixtral 8x7B DPO model supports a range of applications. The model is proficient in generating and refining programming code, as demonstrated by its ability to produce Python scripts for data visualization, and iteratively adapt its responses based on user feedback.
Example of the model generating and iteratively modifying Python code for data visualization in response to user prompts.
In addition to task-oriented code generation, the model can also handle creative tasks, such as composing genre-specific poetry with complex constraints. For instance, it can synthesize themes like machine learning, psychedelics, and quantum mechanics in the style of Shakespeare with a cyberpunk aesthetic.
Sample creative output: the model generates a Shakespearean cyberpunk poem based on a detailed prompt.
Further, the model is capable of prompt engineering and text manipulation, including backtranslation—transforming detailed input text into structured prompts suitable for downstream large language models.
Demonstration of prompt backtranslation: the model condenses a detailed input into a concise instruction for LLMs.
Nous Hermes 2 Mixtral 8x7B DPO adopts the ChatML prompt format, which structures interactions using system, user, and assistant roles. This prompt structure supports coherent multi-turn dialogue and allows control over conversational context, function definitions, and stylistic guidance. Prompt formatting is compatible with Hugging Face's tokenizer.apply_chat_template() utility, supporting integration in chat-oriented applications.
Deployment, Quantization, and Limitations
The model is available in a range of quantized formats, including GGUF, GPTQ, AWQ, and MLX 4-bit versions, providing options for deployments with varying constraints and computing resources. Quantized models are released by both Nous Research and community contributors, covering SFT+DPO and SFT-only configurations.
Despite quantization, hardware requirements are substantial, with inference typically requiring more than 24GB of VRAM even in 4-bit settings. Support packages include deep learning and transformer libraries, and the model is designed to function in environments that can accommodate its memory and compute demands.
Further information on setup instructions and optimal generation settings is available via the official model documentation.
Model Development and Related Work
The compilation and training of Nous Hermes 2 Mixtral 8x7B DPO were facilitated with resources sponsored by Together.ai, and the model was constructed using the Axolotl training framework. The architecture, data pipeline, and benchmarks position this model within the broader lineup of Mixtral derivatives, including the SFT-only version and the base Mixtral 8x7B model. Comparative results are available for users interested in evaluating the impact of DPO training relative to standard supervised fine-tuning.