Browse Models

microsoft /

Phi-3.5 Mini Instruct

Family

Phi 3

Type

Foundation Model

License

MIT License

Released

2024-08-22

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Phi-3.5 Mini Instruct using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

microsoft / Phi-3.5 Mini Instruct

Phi-3.5 Mini Instruct is a 3.8 billion parameter decoder-only Transformer model developed by Microsoft that supports multilingual text generation with a 128,000-token context window. The model demonstrates competitive performance across 22 languages and excels in reasoning, code generation, and long-context tasks, achieving an average benchmark score of 61.4 while maintaining efficient resource utilization.

Explore the Future of AI

Your server, your data, under your control

Phi-3.5 Mini Instruct is an open, compact large language model developed by Microsoft. As part of the Phi-3 family, this model is designed to offer efficient reasoning and multilingual capabilities in a lightweight architecture. Phi-3.5 Mini Instruct extends the approach taken in previous Phi-3 Mini models, incorporating additional post-training data and advancements in model architecture to support both English and diverse multilingual tasks. With a focus on dense reasoning, long context handling, and improved instruction-following, it is positioned as a resource-efficient solution for a broad range of research and application scenarios.

Scatter plot of Phi-3.5 models showing model size vs. performance

Technical Architecture

Phi-3.5 Mini Instruct is implemented as a dense, decoder-only Transformer architecture comprising 3.8 billion parameters, and it employs the same tokenizer as the earlier Phi-3 Mini model. A notable feature is its support for a context window of up to 128,000 tokens, increasing its ability to process and reason over lengthy documents and conversations. This extended context length enables advanced tasks such as long document summarization, question answering, and retrieval-augmented generation across entire meetings or knowledge bases, as described in the official technical report.

The model supports a 32,064-token vocabulary, optimized for a range of high-resource languages. Its training process utilized large-scale computational resources, including 512 H100-80G GPUs over a span of ten days, reflecting its efficient development pipeline.

Multilingual and Reasoning Abilities

Phi-3.5 Mini Instruct demonstrates multilingual performance, benefiting from enhanced post-training data to improve both language coverage and reasoning. Supported languages include Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, and Ukrainian. The model is especially tuned for multi-turn conversations and complex reasoning in these languages, though performance may vary in lower-resource settings.

Evaluations indicate improvements in multi-lingual tasks compared to previous Phi-3 models and other open models with similar or greater parameter counts. The model achieved an average score of 61.4 across major benchmarks, with a score of 55.4 on Multilingual MMLU (Massive Multitask Language Understanding) and 47.9 on the MGSM (Multilingual Grade School Math) benchmark. These results reflect competitive reasoning performance, effective multi-lingual understanding, and progress in long-context and code-related tasks.

When evaluated specifically on multilingual MMLU performance for representative languages, the model displays consistency in its ability to generalize across diverse linguistic scenarios, as detailed in benchmark studies.

Benchmark table showing Phi-3.5-mini-instruct multi-lingual performance

Phi-3.5 Mini Instruct also displays reasoning and logic capabilities, particularly in code generation, mathematical reasoning, and structured problem-solving. On code benchmarks such as HumanEval and MBPP, it performs competitively with, or in some cases surpasses, similar and larger parameter models.

Training Data and Fine-tuning Techniques

The training corpus for Phi-3.5 Mini Instruct consists of approximately 3.4 trillion tokens drawn from rigorously filtered public sources, educational materials, and high-quality code. Notably, the model is augmented with synthetic “textbook-like” data targeting complex math, coding, common sense, and theory of mind tasks.

Instruct-following capability is enhanced through supervised fine-tuning (SFT), proximal policy optimization (PPO), and direct preference optimization (DPO), relying on a blend of human-labeled, synthetic, and translated datasets. The primary data sources have a cutoff date of October 2023, ensuring that information reflects recent developments up to that point.

Performance Benchmarks and Evaluation

Model evaluation has been conducted across a diverse array of benchmarks, with results published relative to other leading models, such as Mistral-7B-Instruct-v0.3, Llama-3.1-8B-Ins, Gemma-2-9B-Ins, and Gemini 1.5 Flash, as well as recent GPT-4 variants. In these comparisons, Phi-3.5 Mini Instruct demonstrates competitive average scores in reasoning and language understanding, achieves high scores in long context benchmarks like RULER (scoring an average of 84.1 across lengths ranging up to 128,000 tokens), and delivers robust performance in code understanding (with an average score of 77 on RepoQA).

Benchmark table showing various models and their scores for different tasks

On specific regional benchmarks, such as CLIcK for Korean, Phi-3.5 Mini Instruct demonstrates improved language-specific performance relative to prior models, indicating progress in both multilingual and regional task coverage.

Applications, Limitations, and Model Family

Phi-3.5 Mini Instruct is intended for a combination of commercial and research applications, optimized for environments with memory or compute constraints as well as latency-sensitive deployments. Its most effective uses are in tasks requiring reasoning—particularly code, math, and long-context summarization or query answering.

Notable limitations include reduced performance for some lower-resource languages, safety considerations common to language models (such as potential bias or output of inappropriate content), and constraints on factual knowledge storage due to the relatively small parameter size. Information reliability can be enhanced by supplementing the model with retrieval-augmented generation (RAG) pipelines.

Phi-3.5 Mini Instruct is accompanied by related models in the Phi-3.5 family, including the Phi-3.5-MoE-instruct (a Mixture-of-Experts model with a larger activation set and higher performance for complex tasks) and Phi-3.5-vision-instruct (extending capabilities to multi-frame visual reasoning). The broader family also includes subsequent Phi-4 models directed at multimodal and further advanced language tasks.

The model is released under the MIT license, supporting open usage and research.

Phi-3.5 Mini Instruct

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Phi-3.5 Mini Instruct

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Technical Architecture

Multilingual and Reasoning Abilities

Training Data and Fine-tuning Techniques

Performance Benchmarks and Evaluation

Applications, Limitations, and Model Family

Helpful Links