Browse Models
The simplest way to self-host Llama 3.1 70B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Meta's Llama 3.1 70B is a multilingual model trained on 15T tokens plus synthetic data, featuring Grouped-Query Attention and combined SFT/RLHF training. It excels in reasoning tasks across 9+ languages and includes integrated safety features like Llama Guard 3 and Code Shield.
Meta's Llama 3.1 70B represents a significant advancement in large language model technology, building upon previous iterations in the Llama family. Released on July 23, 2024, this auto-regressive language model utilizes an optimized transformer architecture featuring Grouped-Query Attention (GQA) for enhanced inference scalability. The model is part of a broader family that includes 8B and 405B parameter variants, all designed with a focus on multilingual capabilities and improved performance.
The model underwent extensive training on over 15 trillion tokens of publicly available data, with a cutoff date of December 2023. The training process was supplemented by 25 million synthetically generated examples used for fine-tuning. The 70B parameter variant specifically required 7.0M GPU hours on H100-80GB GPUs for training, resulting in location-based greenhouse gas emissions of 2,040 tons CO2eq (though market-based emissions were zero due to Meta's renewable energy sourcing).
The model demonstrates strong multilingual capabilities, supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, though it can generate text in additional languages. This multilingual proficiency is achieved through both supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), optimizing for both helpfulness and safety.
Developers can utilize the model through either the Transformers library (requiring version 4.43.0 or higher) or the original llama
codebase. The Transformers pipeline is particularly suited for conversational inference, while the original codebase requires following specific repository instructions. More detailed implementation guidance can be found in the Meta Llama Github repository and Meta Llama Recipes.
The model is distributed under the Llama 3.1 Community License, which includes specific provisions for usage. Notably, users exceeding 700 million monthly active users require a separate license agreement. The license incorporates a detailed Acceptable Use Policy that prohibits generating illegal content, engaging in harmful activities, or intentionally deceiving others.
Meta has implemented various safety measures, including Llama Guard 3, Prompt Guard, and Code Shield. The company actively encourages community feedback through their output reporting mechanism and maintains a bug bounty program for identifying potential issues. Comprehensive information about responsible use and safety considerations can be found in Meta's Responsible Use Guide.