Browse Models
The simplest way to self-host MiniMax Text 01. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
MiniMax Text 01 is a 456B parameter language model using a hybrid architecture of Lightning/Softmax Attention and MoE with 32 experts. Notable for handling up to 4M tokens in context length during inference. Shows strong capabilities in long-document processing and complex reasoning tasks.
MiniMax-Text-01 is a large language model featuring a groundbreaking hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) technologies. With a total of 456 billion parameters and 45.9 billion activated parameters per token, the model represents a significant advancement in efficient large-scale language modeling, as detailed in the research paper.
The model's architecture includes 80 layers with 64 attention heads (128-dimensional each) and 32 experts using a top-2 routing strategy. Each expert has a 9216-dimensional hidden size, contributing to a total hidden size of 6144. The model employs Rotary Position Embedding (RoPE) for positional encoding and has a vocabulary size of 200,064.
Advanced parallel processing techniques, including LASP+, varlen ring attention, and Expert Tensor Parallel (ETP), enable the model to handle exceptionally long contexts - up to 1 million tokens during training and 4 million tokens during inference.
MiniMax-Text-01 demonstrates competitive performance against leading models like GPT-4, Claude-3.5, and others across various benchmarks, including general knowledge, reasoning, mathematics, and coding tasks.
The model particularly excels in long-context tasks, as demonstrated by its performance on benchmarks such as the "4M Needle in a Haystack Test", Ruler, LongBench v2, and MTOB. This capability is illustrated in the following performance visualization:
MiniMax-Text-01 is part of the broader MiniMax-01 series, which includes multimodal variants. Notable among these is MiniMax-VL-01, a vision-language model trained on 512 billion vision-language tokens. While both models share core architectural elements, MiniMax-VL-01 adds a lightweight Vision Transformer (ViT) module for visual processing capabilities.
The model uses Safetensors and is available under a specific Model Agreement found in the license documentation. For implementation, the model supports various precision formats, with int8 quantization recommended for optimal performance.