Browse Models
The simplest way to self-host Llama 2 70B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Llama 2 70B is Meta's largest open language model, trained on 2 trillion tokens with a 4k context window. It features Grouped-Query Attention architecture and extensive fine-tuning through RLHF with 1M+ human annotations. Notable for improved reasoning, coding, and knowledge capabilities compared to smaller Llama variants.
Llama 2 70B is a large language model (LLM) developed by Meta as part of the Llama 2 family of models. It represents the largest variant in the series, with 70 billion parameters. The model uses an optimized transformer architecture incorporating Grouped-Query Attention (GQA) for improved inference scalability, a feature unique to the 70B variant that distinguishes it from the smaller models in the family. As detailed in the model documentation, the model was trained on 2 trillion tokens of publicly available online data, with a data cutoff of September 2022 for pretraining and extending to July 2023 for fine-tuning data.
The training process was computationally intensive, utilizing 1,720,320 GPU hours on A100-80GB GPUs. This resulted in an estimated 291.42 tCO2eq of emissions, which Meta fully offset through their sustainability program. The model accepts text input and generates text output, with a maximum content length of 4k tokens.
Llama 2 70B demonstrates significant improvements over its predecessor and smaller variants across multiple benchmarks. According to Meta's research publication, the model excels in various areas including code generation, commonsense reasoning, world knowledge, reading comprehension, and mathematical reasoning. The fine-tuned versions (Llama-2-Chat) perform comparably to closed-source models like ChatGPT and PaLM in terms of helpfulness and safety.
The model family includes smaller variants (7B and 13B parameters), but the 70B model consistently outperforms these in benchmark testing. Safety improvements are also notable, with lower toxicity rates compared to Llama 1. The model is primarily intended for English language use, though its capabilities extend to various applications including chatbots, code generation, and question answering.
Llama 2 70B was released on July 18, 2023, under a custom commercial license that allows for both research and commercial use, subject to specific terms and conditions. This licensing approach represents a significant contribution to the open-source AI community, making advanced LLM technology more accessible for research and development purposes.
Users should be aware of potential risks, including the possibility of generating inaccurate, biased, or objectionable content. Meta recommends thorough safety testing before deployment, as outlined in their Responsible Use Guide.
The model's various versions (7B, 13B, 70B, and their chat variants) are available through the Hugging Face platform, with both standard and chat-optimized versions accessible for each parameter size. For those interested in exploring the model's capabilities, Meta provides extensive documentation and guidelines through their various platforms and resources.