Browse Models
The simplest way to self-host Llama 2 7B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Llama 2 7B is Meta AI's compact language model with 7 billion parameters, trained on 2 trillion tokens. It features a 4,000 token context window and excels at code generation, reasoning, and math tasks. Available in base and chat-tuned variants, it offers efficient performance without GQA mechanisms used in larger versions.
Llama 2 7B is a large language model (LLM) developed by Meta AI, released on July 18, 2023. It's part of the Llama 2 family of models, which includes variants ranging from 7B to 70B parameters. The model utilizes an optimized transformer architecture and is designed for auto-regressive text generation. Unlike its larger 70B parameter sibling, the 7B model does not employ Grouped-Query Attention (GQA).
The model was trained between January and July 2023 on approximately 2 trillion tokens of publicly available online data, explicitly excluding Meta user data. Training utilized a learning rate of 3.0 x 10^-4 and a global batch size of 4M tokens. The model supports a context length of 4k tokens, making it suitable for processing longer text sequences.
Llama 2 7B demonstrates significant improvements over its predecessor, particularly in instruction-following capabilities and reasoning abilities. The model has been benchmarked across various tasks, showing strong performance in:
Performance evaluations primarily focus on English language tasks, and the model may not generalize well to other languages. Safety assessments using TruthfulQA and ToxiGen benchmarks indicate improvements in toxicity reduction compared to Llama 1.
The model is available in two main variants:
The fine-tuned versions utilize supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance helpfulness and safety. For optimal chat performance, specific formatting is recommended, including the use of INST
and <<SYS>>
tags, along with appropriate BOS
and EOS
tokens.
Compared to its larger siblings (13B and 70B parameters), the 7B model offers a balance between performance and resource efficiency, making it more suitable for deployment on less powerful hardware while maintaining reasonable capabilities across various tasks.
Llama 2 7B is available under a custom commercial license from Meta, permitting both research and commercial use. Users must adhere to the Acceptable Use Policy, which prohibits generating illegal content or violating user rights. The model is static, trained on an offline dataset, with Meta planning future safety improvements based on community feedback.