Browse Models
The MiniMax model family represents a significant advancement in large language model technology, with its flagship model MiniMax-Text-01 introducing groundbreaking architectural innovations in early 2025. The family is developed by MiniMaxAI and encompasses both text-only and multimodal variants, all sharing core architectural elements while specializing in different capabilities. According to the official documentation, the family is designed to push the boundaries of context length handling and efficient parameter utilization in large language models.
The MiniMax family is built upon a hybrid architecture that combines multiple cutting-edge technologies. As detailed in the research paper, the models utilize a combination of Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE) technologies. The flagship MiniMax-Text-01 features 456 billion total parameters with 45.9 billion activated parameters per token, representing a significant advancement in efficient parameter utilization.
The architecture includes sophisticated components such as 80 layers with 64 attention heads (128-dimensional each) and 32 experts using a top-2 routing strategy. Each expert has a 9216-dimensional hidden size, contributing to a total hidden size of 6144. The models employ Rotary Position Embedding (RoPE) for positional encoding and maintain a vocabulary size of 200,064 tokens.
MiniMax-Text-01 serves as the foundation model of the family, specializing in text processing and generation. Its most notable feature is the ability to handle exceptionally long contexts - up to 1 million tokens during training and 4 million tokens during inference. This capability is enabled by advanced parallel processing techniques including LASP+, varlen ring attention, and Expert Tensor Parallel (ETP).
The family includes MiniMax-VL-01, a vision-language model that builds upon the base architecture of MiniMax-Text-01. This variant incorporates a lightweight Vision Transformer (ViT) module for visual processing capabilities while maintaining the core text processing architecture. According to the official announcement, MiniMax-VL-01 was trained on 512 billion vision-language tokens, enabling sophisticated multimodal understanding and generation capabilities.
The MiniMax family demonstrates competitive performance against leading models in the field, including GPT-4 and Claude-3.5. The models excel particularly in tasks requiring long-context understanding and processing, as evidenced by their performance on specialized benchmarks such as the "4M Needle in a Haystack Test", Ruler, LongBench v2, and MTOB.
The MiniMax family models are implemented using Safetensors and are available under a specific Model Agreement, as detailed in the license documentation. The models support various precision formats, with int8 quantization recommended for optimal performance in production environments.
According to the technical documentation, the family's implementation emphasizes efficient deployment and scalability. The models can be deployed across various computing environments while maintaining their performance characteristics, particularly their exceptional context length handling capabilities.
The MiniMax family has demonstrated significant potential across various applications, particularly in scenarios requiring extensive context understanding and processing. Common use cases include:
While the initial release of the MiniMax family in early 2025 has already demonstrated impressive capabilities, the architecture's modular nature suggests potential for future expansions and improvements. The successful integration of multimodal capabilities in MiniMax-VL-01 indicates the possibility of additional specialized variants targeting specific domains or modalities.