Launch a dedicated cloud GPU server running Laboratory OS to download and run Phi-3.5 Mini Instruct using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.
The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.
Model Report
microsoft / Phi-3.5 Mini Instruct
Phi-3.5 Mini Instruct is a 3.8 billion parameter decoder-only Transformer model developed by Microsoft that supports multilingual text generation with a 128,000-token context window. The model demonstrates competitive performance across 22 languages and excels in reasoning, code generation, and long-context tasks, achieving an average benchmark score of 61.4 while maintaining efficient resource utilization.
Explore the Future of AI
Your server, your data, under your control
Phi-3.5 Mini Instruct is an open, compact large language model developed by Microsoft. As part of the Phi-3 family, this model is designed to offer efficient reasoning and multilingual capabilities in a lightweight architecture. Phi-3.5 Mini Instruct extends the approach taken in previous Phi-3 Mini models, incorporating additional post-training data and advancements in model architecture to support both English and diverse multilingual tasks. With a focus on dense reasoning, long context handling, and improved instruction-following, it is positioned as a resource-efficient solution for a broad range of research and application scenarios.
Scatter plot illustrating the efficiency of Phi-3.5-mini and Phi-3.5-MoE models in relation to quality (MMLU Avg) and parameter size.
Phi-3.5 Mini Instruct is implemented as a dense, decoder-only Transformer architecture comprising 3.8 billion parameters, and it employs the same tokenizer as the earlier Phi-3 Mini model. A notable feature is its support for a context window of up to 128,000 tokens, increasing its ability to process and reason over lengthy documents and conversations. This extended context length enables advanced tasks such as long document summarization, question answering, and retrieval-augmented generation across entire meetings or knowledge bases, as described in the official technical report.
The model supports a 32,064-token vocabulary, optimized for a range of high-resource languages. Its training process utilized large-scale computational resources, including 512 H100-80G GPUs over a span of ten days, reflecting its efficient development pipeline.
Multilingual and Reasoning Abilities
Phi-3.5 Mini Instruct demonstrates multilingual performance, benefiting from enhanced post-training data to improve both language coverage and reasoning. Supported languages include Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, and Ukrainian. The model is especially tuned for multi-turn conversations and complex reasoning in these languages, though performance may vary in lower-resource settings.
Evaluations indicate improvements in multi-lingual tasks compared to previous Phi-3 models and other open models with similar or greater parameter counts. The model achieved an average score of 61.4 across major benchmarks, with a score of 55.4 on Multilingual MMLU (Massive Multitask Language Understanding) and 47.9 on the MGSM (Multilingual Grade School Math) benchmark. These results reflect competitive reasoning performance, effective multi-lingual understanding, and progress in long-context and code-related tasks.
When evaluated specifically on multilingual MMLU performance for representative languages, the model displays consistency in its ability to generalize across diverse linguistic scenarios, as detailed in benchmark studies.
Data table reporting the Phi-3.5-mini-instruct model's multilingual MMLU scores compared to other models across selected supported languages.
Phi-3.5 Mini Instruct also displays reasoning and logic capabilities, particularly in code generation, mathematical reasoning, and structured problem-solving. On code benchmarks such as HumanEval and MBPP, it performs competitively with, or in some cases surpasses, similar and larger parameter models.
Training Data and Fine-tuning Techniques
The training corpus for Phi-3.5 Mini Instruct consists of approximately 3.4 trillion tokens drawn from rigorously filtered public sources, educational materials, and high-quality code. Notably, the model is augmented with synthetic “textbook-like” data targeting complex math, coding, common sense, and theory of mind tasks.
Instruct-following capability is enhanced through supervised fine-tuning (SFT), proximal policy optimization (PPO), and direct preference optimization (DPO), relying on a blend of human-labeled, synthetic, and translated datasets. The primary data sources have a cutoff date of October 2023, ensuring that information reflects recent developments up to that point.
Performance Benchmarks and Evaluation
Model evaluation has been conducted across a diverse array of benchmarks, with results published relative to other leading models, such as Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Ins, Gemma-2-9B-Ins, and Gemini 1.5 Flash, as well as recent GPT-4 variants. In these comparisons, Phi-3.5 Mini Instruct demonstrates competitive average scores in reasoning and language understanding, achieves high scores in long context benchmarks like RULER (scoring an average of 84.1 across lengths ranging up to 128,000 tokens), and delivers robust performance in code understanding (with an average score of 77 on RepoQA).
Comprehensive benchmark table detailing the performance of Phi-3.5-mini-instruct versus peer models across reasoning, multilingual, math, long context, and code generation tasks.
On specific regional benchmarks, such as CLIcK for Korean, Phi-3.5 Mini Instruct demonstrates improved language-specific performance relative to prior models, indicating progress in both multilingual and regional task coverage.
Applications, Limitations, and Model Family
Phi-3.5 Mini Instruct is intended for a combination of commercial and research applications, optimized for environments with memory or compute constraints as well as latency-sensitive deployments. Its most effective uses are in tasks requiring reasoning—particularly code, math, and long-context summarization or query answering.
Notable limitations include reduced performance for some lower-resource languages, safety considerations common to language models (such as potential bias or output of inappropriate content), and constraints on factual knowledge storage due to the relatively small parameter size. Information reliability can be enhanced by supplementing the model with retrieval-augmented generation (RAG) pipelines.
Phi-3.5 Mini Instruct is accompanied by related models in the Phi-3.5 family, including the Phi-3.5-MoE-instruct (a Mixture-of-Experts model with a larger activation set and higher performance for complex tasks) and Phi-3.5-vision-instruct (extending capabilities to multi-frame visual reasoning). The broader family also includes subsequent Phi-4 models directed at multimodal and further advanced language tasks.
The model is released under the MIT license, supporting open usage and research.