Browse Models

NousResearch /

Nous Hermes Mixtral 8X7B DPO

Family

Mixtral

Type

Fine-Tuned Model

License

Apache-2.0 License

Released

2024-01-11

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Nous Hermes Mixtral 8X7B DPO using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

NousResearch / Nous Hermes Mixtral 8X7B DPO

Nous Hermes Mixtral 8X7B DPO is a large language model developed by NousResearch using the Mixtral 8x7B Mixture of Experts architecture with approximately 46.7 billion parameters. The model combines supervised fine-tuning and direct preference optimization (DPO) training on over one million entries of GPT-4-generated and open-source data. It demonstrates capabilities in code generation, creative writing, and conversational tasks while supporting ChatML prompt formatting and various quantized deployment options.

Explore the Future of AI

Your server, your data, under your control

Nous Hermes 2 Mixtral 8x7B DPO is a large language model developed by Nous Research, leveraging the Mixture of Experts (MoE) architecture known as Mixtral 8x7B. This model integrates supervised fine-tuning and direct preference optimization (DPO) to enhance its reasoning, text generation, and dialogue capabilities. Built upon a substantial dataset composed mainly of GPT-4-generated and high-quality open-source data, the model exhibits performance across diverse benchmarks and tasks, positioning it within the Mixtral family of models.

Model Architecture and Training

The foundation of Nous Hermes 2 Mixtral 8x7B DPO is the Mixtral 8x7B MoE large language model, which utilizes a Mixture of Experts approach to dynamically activate subsets of its network for different tasks. This architecture consists of approximately 46.7 billion parameters, partitioned to optimize computational efficiency and capacity for complex reasoning.

For training, Nous Hermes 2 Mixtral 8x7B DPO was exposed to over one million entries, with a focus on maximizing the diversity and quality of its learning materials. The data mix includes substantial synthetic data generated by GPT-4, supplemented with verified open datasets. Two principal training strategies were employed: supervised fine-tuning (SFT), which aligns the model with high-quality human-like responses, and DPO, a reinforcement learning technique that utilizes preference data to guide the model. This dual-phase approach yielded the SFT+DPO version, and a comparative SFT-only model was also released.

Benchmark Performance

Evaluation results indicate that Nous Hermes 2 Mixtral 8x7B DPO exhibits specific performance metrics on established benchmarks. On the GPT4All suite, the model achieves an average score of 75.70, with specific results on tasks such as ARC Challenge (accuracy: 0.5990) and BoolQ (accuracy: 0.8783), as detailed on the model's Hugging Face repository.

In the AGIEval tests, which emphasize logical reasoning and academic aptitude, the DPO model reports an average accuracy of 46.05. BigBench, a comprehensive reasoning suite, yields an average score of 49.70. Comparative analysis against the Mixtral 8x7B model indicates that the SFT+DPO variant of Nous Hermes 2 yields specific performance differences in overall metrics and specific tasks such as MMLU and ARC.

BigBench benchmark scores for Nous Hermes and related models

Performance comparison between Nous Hermes 2 and Mixtral Instruct models

Capabilities and Applications

The Nous Hermes 2 Mixtral 8x7B DPO model supports a range of applications. The model is proficient in generating and refining programming code, as demonstrated by its ability to produce Python scripts for data visualization, and iteratively adapt its responses based on user feedback.

Data visualization code generation example

In addition to task-oriented code generation, the model can also handle creative tasks, such as composing genre-specific poetry with complex constraints. For instance, it can synthesize themes like machine learning, psychedelics, and quantum mechanics in the style of Shakespeare with a cyberpunk aesthetic.

Further, the model is capable of prompt engineering and text manipulation, including backtranslation—transforming detailed input text into structured prompts suitable for downstream large language models.

Prompt Formatting and Interfaces

Nous Hermes 2 Mixtral 8x7B DPO adopts the ChatML prompt format, which structures interactions using system, user, and assistant roles. This prompt structure supports coherent multi-turn dialogue and allows control over conversational context, function definitions, and stylistic guidance. Prompt formatting is compatible with Hugging Face's tokenizer.apply_chat_template() utility, supporting integration in chat-oriented applications.

Deployment, Quantization, and Limitations

The model is available in a range of quantized formats, including GGUF, GPTQ, AWQ, and MLX 4-bit versions, providing options for deployments with varying constraints and computing resources. Quantized models are released by both Nous Research and community contributors, covering SFT+DPO and SFT-only configurations.

Despite quantization, hardware requirements are substantial, with inference typically requiring more than 24GB of VRAM even in 4-bit settings. Support packages include deep learning and transformer libraries, and the model is designed to function in environments that can accommodate its memory and compute demands.

Further information on setup instructions and optimal generation settings is available via the official model documentation.

Model Development and Related Work

The compilation and training of Nous Hermes 2 Mixtral 8x7B DPO were facilitated with resources sponsored by Together.ai, and the model was constructed using the Axolotl training framework. The architecture, data pipeline, and benchmarks position this model within the broader lineup of Mixtral derivatives, including the SFT-only version and the base Mixtral 8x7B model. Comparative results are available for users interested in evaluating the impact of DPO training relative to standard supervised fine-tuning.

Nous Hermes Mixtral 8X7B DPO

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Nous Hermes Mixtral 8X7B DPO

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Model Architecture and Training

Benchmark Performance

Capabilities and Applications

Prompt Formatting and Interfaces

Deployment, Quantization, and Limitations

Model Development and Related Work

External Links