Launch a dedicated cloud GPU server running Laboratory OS to download and run Dolphin 2.6 Mistral using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.
The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.
Model Report
cognitivecomputations / Dolphin 2.6 Mistral
Dolphin 2.6 Mistral is a 7.24 billion parameter text generation model developed by Cognitive Computations, built upon the Mistral-7B architecture with a 16,000-token context window. The model employs explanation tuning methodology inspired by Microsoft's Orca research, utilizing uncensored datasets and Direct Preference Optimization to enhance reasoning and instruction-following capabilities across coding and general reasoning tasks.
Explore the Future of AI
Your server, your data, under your control
Dolphin 2.6 Mistral is an open-source, instruct-tuned large language model developed by Cognitive Computations with sponsorship from Convai. Built upon the Mistral-7B architecture, this generative model aims to advance reasoning and instruction-following capabilities by applying progressive learning techniques adopted from Microsoft’s Orca research. The Dolphin series utilizes uncensored datasets and explanation-based tuning to generate detailed, stepwise responses across a variety of domains, including coding and general reasoning.
Cyborg dolphin illustration used as a visual identifier for Dolphin 2.6 Mistral.
Dolphin 2.6 Mistral is based on the Mistral-7B foundation, leveraging a 7.24 billion parameter transformer with a 16,000-token context window. The development and training practices are heavily influenced by the Orca project from Microsoft Research, which introduced progressive learning from complex reasoning traces generated by large foundation models (LFMs) such as GPT-4. Dolphin’s architecture and training pipeline incorporate Direct Preference Optimization (DPO) for aligning outputs with user preference data and maximize instruction-following performance.
The model was trained using Axolotl, a widely adopted language model finetuning framework. Tokenization follows approaches similar to the Orca model, relying on a variant of byte pair encoding and padding strategies to accommodate variable-length sequences. Packing and loss computation techniques are implemented to efficiently utilize larger training batches while focusing learning on teacher-labeled tokens, as described in the Orca technical report. Training for Dolphin 2.6 Mistral was performed using high-performance GPU clusters, facilitating full-parameter finetuning across several epochs.
Training Methods and Datasets
The Dolphin 2.6 Mistral model employs a DPO tuning process, using the argilla/ultrafeedback-binarized-preferences-cleaned dataset to enforce user-aligned response preferences. The broader training set implements principles from Orca’s explanation tuning paradigm: data entries are paired with both baseline prompts and rich, step-by-step explanations produced by GPT-4 and ChatGPT. This progressive learning strategy leverages multiple stages—initially training with ChatGPT-augmented data, then refining with longer and more intricate GPT-4 explanations—to bridge capacity gaps between large foundation models and smaller, open-source models.
A significant portion of the dataset follows the FLAN-v2 task distribution, which includes zero-shot and chain-of-thought (CoT) tasks. Datasets are filtered to exclude alignment, refusals, and bias responses, creating an uncensored model with high compliance to a broad range of user instructions. Notably, substantial coding datasets are included, enabling enhanced performance on code generation and problem-solving tasks.
Benchmark Performance
Dolphin 2.6 Mistral 7B demonstrates competitive results on multiple standard benchmarks. According to the Open LLM Leaderboard, the model attains an aggregate score of 67.20, reflecting strong capabilities across diverse reasoning domains.
Scores on major evaluations include 65.61 on the AI2 Reasoning Challenge (25-shot), 85.48 on HellaSwag (10-shot), and 63.24 on MMLU (5-shot). TruthfulQA (0-shot) performance is recorded at 61.47, Winogrande (5-shot) at 78.61, and GSM8k (5-shot) at 48.75. These results place Dolphin 2.6 Mistral in close proximity to established open-source models and approaching the performance of commercial systems on a range of knowledge, common-sense, and open-ended generation tasks.
Comparative analyses guided by the Orca paper show that explanation-tuned models, including Orca-13B, retain a high proportion of ChatGPT and GPT-4 performance on open-ended tasks, and substantially outperform other open-source models such as Vicuna-13B in reasoning and truthfulness measures.
Design Features and Use Cases
A defining feature of Dolphin 2.6 Mistral is its utilization of "explanation tuning," where system messages instruct the model to emulate stepwise reasoning and detailed justifications, reminiscent of human problem-solving logic. Sixteen manually crafted instruction templates guide output length, structure, and chain-of-thought content, fostering adaptability for diverse tasks and enhancing transparency in model reasoning. Unlike conventionally instruction-tuned models that learn solely from question-answer pairs, the inclusion of intermediate explanation traces yields richer supervisory signals.
The uncensored nature of the model results from meticulous data filtering, which seeks to remove alignment and bias interventions commonly present in mainstream language model training. This approach makes the model responsive to a broader class of user requests, but also places the responsibility of output filtering and alignment on downstream users.
Dolphin’s design and tuning make it suitable for advanced reasoning, instruction following, coding, structured output, and agentic applications. Planned iterations, such as Dolphin 3.0, intend further optimization for chat, role-playing, and multi-turn dialogue scenarios.
Limitations and Ethical Considerations
While Dolphin 2.6 Mistral is designed for broad compliance and responsiveness, its uncensored nature means content filtering and alignment must be managed by the deploying party. The model remains subject to general large language model limitations, including inherited biases from training data, lack of genuine world understanding, susceptibility to hallucinations, and challenges in providing fully transparent rationales for output.
As an open-source model influenced by teacher-student imitation, Dolphin's performance in complex multi-turn conversations or in-context few-shot learning remains less well-explored. Its effectiveness is generally strongest on domains and formats well-represented in its tuning data, such as code generation and single-turn reasoning queries. The Orca paper emphasizes that evaluation beyond zero-shot settings, and on tasks with minimal data representation, may yield variable outcomes.
Dolphin 2.6 Mistral and its datasets are released under the Apache 2.0 license, permitting broad commercial and research use, though models based on non-commercial bases, such as LLaMA, inherit upstream license constraints. The model is intended primarily for research and further development, rather than direct downstream deployment in sensitive applications without additional scrutiny.
Model Lineage and Future Development
Dolphin 2.6 Mistral is one member of a broader Dolphin series derived from the methodologies established by Microsoft’s Orca research. The original Orca-13B model, trained on a LLaMA-13B base, set the benchmark for explanation-based, progressive learning and established parity with ChatGPT on reasoning tasks, while substantially outperforming other open-source models in several evaluations.
Further Dolphin variants are planned across multiple model bases including Falcon, OpenLLaMA, MPT, RWKV, and advanced-scale architectures. These ongoing developments signal a continued emphasis on richer supervisory signals, open-source transparency, and diverse model use cases ranging from general chat to structured agentic interactions.