Browse Models

MiniMaxAI /

MiniMax Text 01

Family

MiniMax

Type

Foundation Model

License

MiniMax Model License

Released

2025-01-15

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run MiniMax Text 01 using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

MiniMaxAI / MiniMax Text 01

MiniMax Text 01 is an open-source large language model developed by MiniMaxAI featuring 456 billion total parameters with 45.9 billion active per token. The model employs a hybrid attention mechanism combining Lightning Attention with periodic Softmax Attention layers across 80 transformer layers, utilizing a Mixture-of-Experts design with 32 experts and Top-2 routing. It supports context lengths up to 4 million tokens during inference and demonstrates competitive performance across text generation, reasoning, and coding benchmarks.

Explore the Future of AI

Your server, your data, under your control

MiniMax-Text-01 is an open-source large language model (LLM) developed by MiniMax, designed to exhibit extended context understanding and reasoning capabilities. It is part of the MiniMax-01 series, which also features the multi-modal vision-language model MiniMax-VL-01. MiniMax-Text-01 was publicly released on January 15, 2025, and its technical details have been made available to the research community.

Core text benchmark comparison chart for MiniMax-Text-01.

Model Architecture

MiniMax-Text-01 encompasses 456 billion total parameters, with 45.9 billion parameters actively used per token. The architecture employs a hybrid attention mechanism that combines Lightning Attention—a linear, I/O-optimized attention variant derived from TransNormer—with periodic Softmax Attention layers. Specifically, a Softmax Attention layer is interleaved after every seven Lightning Attention layers, resulting in a total of 80 transformer layers. Each layer incorporates 64 attention heads with an attention head dimension of 128, and half of each attention head utilizes Rotary Position Embedding (RoPE) with a large base frequency, supporting long-sequence modeling.

A Mixture-of-Experts (MoE) design is employed, comprising 32 experts and using a Top-2 routing strategy. This allows each token to be routed through selected expert feed-forward networks, enhancing scalability while optimizing computational efficiency. A global routing innovation ensures balanced load distribution and prevents collapse in expert utilization. The model further integrates DeepNorm for training stability and leverages parallelization strategies such as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, and Expert Tensor Parallelism (ETP), enabling MiniMax-Text-01 to sustain sequence context lengths up to 1 million tokens during training and up to 4 million tokens during inference.

MiniMax-Text-01 architecture diagram with hybrid attention and Mixture-of-Experts.

Custom CUDA kernels and inference optimizations yield efficient deployment, supporting over 75% model flops utilization on contemporary hardware platforms. Recommended quantization for loading the model is int8, balancing model size and inference speed.

Training Data and Methods

The pre-training corpus for MiniMax-Text-01 was curated from diverse sources, including academic literature, books, web content, and programming code. A well-defined filtering pipeline, using a reward labeler based on earlier MiniMax models, selects high-quality data across multiple axes such as knowledge depth and document helpfulness. Data mixture balancing and deduplication strategies ensure both diversity and consistency.

Tokenization employs byte-level Byte Pair Encoding (BPE), favoring multilingual content and a vocabulary of 200,064 tokens. The training strategy adopts a staged approach: after initial pretraining, a three-phase context extension up-samples long-sequence data, and post-training includes supervised fine-tuning (SFT), offline reinforcement learning via Direct Preference Optimization (DPO), and online reinforcement learning. Safety alignment is managed by constructing a harmless reward model that balances utility with content reliability.

Benchmark Performance

MiniMax-Text-01 exhibits performance across a broad range of academic and applied benchmarks, with results comparable to or exceeding those of other LLMs such as GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro. Results in core benchmarks—including multi-task language understanding, open-domain question answering, logical reasoning, mathematical problem-solving, and code synthesis—demonstrate its general and specialized capabilities.

Benchmark results table for MiniMax-Text-01 on academic tasks.

In ultra-long context evaluations, the model achieves 100% accuracy in the 4-million-token Needle-in-a-Haystack retrieval task, and maintains minimal degradation on the RULER benchmark’s 1-million-token context length. MiniMax-Text-01 also achieves high scores on the LongBench v2 benchmark, with and without Chain-of-Thought reasoning. In-house evaluations indicate its performance in creative writing, information retrieval, safety, and document analysis productivity scenarios.

Long-context benchmark tables (RULER, LongBench v2).

In-house application benchmark table for MiniMax-Text-01.

Applications and Use Cases

MiniMax-Text-01 is designed as a general-purpose AI assistant, supporting a variety of applications including complex question answering, creative writing, coding, and long-document analysis. Its ultra-long context window and robust inference optimizations make it suited for tasks such as document translation, summarization, and productivity scenarios involving large-scale information processing. The model also supports function calling, enabling integration with external tools and APIs via structured outputs in JSON format, and is compatible with the Transformers ecosystem.

Limitations and Ongoing Research

Despite its performance in most areas, MiniMax-Text-01’s capabilities in advanced programming and mathematical reasoning are constrained by current training data limitations. Its hybrid architecture retains some softmax attention layers (1/8 of total attention layers), and research continues into architectures entirely based on linear attention, aiming to facilitate theoretically unlimited context windows with further efficiency gains.

Challenges persist in evaluating long-text reasoning, as available benchmarks typically focus on retrieval or synthetic tasks. Future efforts are dedicated to enhancing in-context learning and scaling data quality for extended reasoning ability in practical, real-world applications.

Model Series and Multimodal Extensions

MiniMax-Text-01 is the foundational language model in the MiniMax-01 series. Its multimodal sibling, MiniMax-VL-01, leverages a Vision Transformer (ViT) for visual encoding, coupled with a two-layer MLP projector, and utilizes MiniMax-Text-01 as its base language model. MiniMax-VL-01 supports variable visual input resolutions and has attained scores in multiple vision-language leaderboards, although challenges remain for advanced mathematical reasoning in multimodal settings.

Core text, multimodal, and long-context benchmark comparison plot (MiniMax-Text-01 and MiniMax-VL-01).

Licensing

The MiniMax-Text-01 model weights are distributed under the MiniMax Model Agreement, and the accompanying code is available under the MIT License. Researchers and developers are directed to consult the license documentation for detailed usage and redistribution terms.

MiniMax Text 01

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

MiniMax Text 01

Laboratory OS

Direct Download

Open WebUI

Text Generation Web UI

Explore the Future of AI

Your server, your data, under your control

Model Architecture

Training Data and Methods

Benchmark Performance

Applications and Use Cases

Limitations and Ongoing Research

Model Series and Multimodal Extensions

Licensing

Helpful Links