Launch a dedicated cloud GPU server running Laboratory OS to download and run OpenHermes 2.5 Mistral 7B using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.
The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.
Model Report
teknium / OpenHermes 2.5 Mistral 7B
OpenHermes 2.5 Mistral 7B is a 7.24 billion parameter language model fine-tuned from Mistral-7B-v0.1 using distilled supervised fine-tuning and direct preference optimization. Developed by teknium, it was trained on approximately 1,000,000 dialogue entries with 7-14% programming instructions, achieving notable improvements in conversational AI, coding tasks, and general language performance across standard benchmarks including HumanEval and TruthfulQA.
Explore the Future of AI
Your server, your data, under your control
OpenHermes 2.5 Mistral 7B is a large language model (LLM) developed by Teknium as a continuation and improvement upon previous OpenHermes iterations. Built by fine-tuning the Mistral-7B-v0.1 architecture, OpenHermes 2.5 integrates advanced alignment and training strategies to enhance its utility for conversational, creative, and code-oriented tasks. The model’s name pays homage to Hermes, the Greek messenger god, symbolizing its communicative role as an AI assistant.
A stylized banner image representing OpenHermes 2.5 Mistral 7B, signifying its blend of mythological inspiration and advanced technology.
OpenHermes 2.5 Mistral 7B is grounded in the Mistral 7B architecture, a transformer model with approximately 7.24 billion parameters, designed for performance. The fine-tuning process draws on methodologies established in models like Zephyr-7B, which use a pipeline consisting of distilled supervised fine-tuning, preference optimization, and reinforcement with alignment data.
Training leveraged axolotl for data transformation, ensuring compatibility with standard formats such as ShareGPT and ChatML. This structured approach facilitates improved multi-turn dialogue capabilities, more nuanced system prompts, and reliable alignment with human preferences. The resulting model exhibits strong generalist performance, enhanced particularly by the inclusion of code-based instruction data during training.
Datasets and Alignment Techniques
The development of OpenHermes 2.5 Mistral 7B involved curated datasets comprising roughly 1,000,000 dialogue entries, primarily generated via GPT-4, and supplemented with additional high-quality publicly available data. Significant portions of the dataset—estimated between 7% and 14%—contain programming instructions, contributing to measurable improvements in both code and general language tasks.
Following processes observed in Zephyr-7B, training included extensive supervised fine-tuning on multi-turn conversations, collection of preference data rated by large language models such as GPT-4, and distilled direct preference optimization (dDPO), an approach that directly optimizes for responses preferred by teacher models.
Transformation into the ChatML format ensured prompt and response consistency, aiding reproducible alignment and increased model interoperability.
Performance Evaluation and Benchmarks
OpenHermes 2.5 Mistral 7B achieves competitive results across a variety of benchmarks, frequently surpassing previous OpenHermes and other Mistral-based fine-tuned models at this scale. Notably, the integration of additional code-centric instruction data during training improved performance on benchmark suites such as GPT4All, AGIEval, TruthfulQA, and HumanEval.
OpenHermes 2.5 Mistral 7B is positioned as a general-purpose conversational agent, demonstrating strength in a wide array of practical applications. Its outputs showcase proficiency in programming assistance, creative composition, philosophical discussion, and character roleplay.
Model output: Demonstrating the generation of Python code for OpenAI API usage in response to a user request for programming assistance.
OpenHermes 2.5 Mistral 7B employs the ChatML prompt format, structured to facilitate multi-turn dialogues with consistent system and message roles. This format is compatible with OpenAI-style endpoints and modern transformers frameworks.
A typical prompt sequence in ChatML may appear as follows:
<|im_start|>system
You are Hermes 2, a superintelligent artificial intelligence developed by Teknium. Your purpose is to assist users with any request.<|im_end|>
<|im_start|>user
Hello, who are you?<|im_end|>
<|im_start|>assistant
Hi there! My name is Hermes 2, and I am here to assist you.<|im_end|>
When interacting locally, graphical tools such as LM Studio support easy configuration of prompt templates. Selection of the "ChatML" preset within such interfaces ensures compatibility with OpenHermes 2.5’s expected input structure.
Screenshot showing the selection of the "ChatML" prompt preset in LM Studio for OpenHermes 2.5 Mistral 7B.
The tokenizer from Hugging Face Transformers can apply the ChatML template programmatically using tokenizer.apply_chat_template() with the correct parameters to facilitate generation.
Model Family, Limitations, and Considerations
OpenHermes 2.5 Mistral 7B is part of the broader Hermes model family, including earlier iterations such as OpenHermes-1 Llama-2 13B, OpenHermes 2 Mistral 7B, and larger-scale versions like Hermes 70B. Each successive generation integrates refinements in dataset construction, alignment, and system prompt usage.
Despite advances, limitations inherited from similar fine-tuning pipelines persist. The use of teacher models like GPT-4 for evaluation and preference data collection introduces potential biases, as newer models may be indirectly optimized for scores on benchmarks that rely on the same teacher’s outputs. Furthermore, while the addition of code data improved overall performance, certain specialized domains—particularly highly technical math and safety-critical dialogues—may not reach top benchmark results compared to larger, proprietary models. Current training methodologies focus on helpfulness and alignment but do not directly address all aspects of safe or harm-avoiding behavior, which requires additional curation and evaluation strategies.