Browse Models
The simplest way to self-host Mistral 7B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Mistral 7B (September 2023) is a 7B parameter language model that matches larger models' capabilities through innovative Grouped-Query and Sliding Window Attention. It handles 8192-token contexts and excels in mathematics, reasoning, and code tasks, outperforming some 13B models in benchmarks.
Mistral 7B is a 7-billion parameter Large Language Model (LLM) released by Mistral AI on September 27, 2023. The model's architecture is based on a transformer design with several key innovations that enhance its efficiency and performance. It features a dimension of 4096 and 32 layers, with a context length of 8192 tokens.
The model incorporates two significant architectural improvements: Grouped-Query Attention (GQA) and Sliding Window Attention (SWA). GQA enables faster inference and reduces memory requirements, while SWA allows for efficient processing of longer sequences. These mechanisms, combined with improvements from FlashAttention and xFormers, result in a 2x speed improvement for sequence lengths of 16k with a 4k window.
For tokenization, Mistral 7B employs a Byte-fallback BPE tokenizer. The model also utilizes a rolling buffer cache to optimize memory usage during inference, along with pre-filling and chunking techniques for efficient handling of long sequences.
Mistral 7B demonstrates exceptional performance across various benchmarks, surpassing Llama 2 13B on all tested metrics as detailed in the research paper. The model shows particular strength in code and reasoning tasks, achieving results equivalent to a Llama 2 model three times its size. It even outperforms Llama 1 34B in mathematics, code generation, and reasoning tasks.
Benchmark evaluations included:
The model's evaluation methodology differs from previous benchmarks in some aspects, such as using the hand-verified subset for MBPP and omitting Wikipedia contexts for TriviaQA, ensuring more rigorous testing conditions.
The Mistral 7B family includes two main variants:
Mistral 7B Base Model: The foundation model suitable for various downstream tasks and fine-tuning applications. It's particularly notable for its strong performance in code-related tasks while maintaining robust English language capabilities.
Mistral 7B Instruct: A fine-tuned chat model that demonstrates superior performance compared to Llama 2 13B chat models on MT-Bench. This variant excels in instruction-following tasks and has shown strong results in both human and automated evaluations.
Both models are released under the Apache 2.0 license, making them accessible for both research and commercial applications. However, it's important to note that as base models, they lack built-in moderation mechanisms, and Mistral AI has indicated plans to collaborate with the community to improve safety and guardrails.