Browse Models
The simplest way to self-host Llama 3.3 70B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Meta's 70B parameter multilingual model supports 8 languages and excels at reasoning, code, and math tasks. Uses Grouped-Query Attention and combines supervised fine-tuning with RLHF. Trained on 15T tokens with 25M synthetic examples. Knowledge cutoff: Dec 2023.
Meta's Llama 3.3-70B-Instruct represents a significant advancement in multilingual language models, released on December 6, 2024. This 70-billion parameter model utilizes an optimized transformer architecture enhanced with Grouped-Query Attention (GQA) for improved inference scalability. The model combines supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to achieve superior performance in helpfulness and safety compared to previous iterations.
The model was trained on an extensive dataset comprising over 15 trillion tokens of publicly available data, supplemented by more than 25 million synthetically generated examples for fine-tuning. The training process was computationally intensive, requiring 39.3 million GPU hours on H100-80GB GPUs. This resulted in estimated location-based greenhouse gas emissions of 2,040 tons CO2eq, though Meta's use of renewable energy resulted in net-zero market-based emissions. The detailed methodology for energy use and emissions calculations is available in their research publication.
Llama 3.3-70B-Instruct demonstrates exceptional multilingual capabilities, officially supporting eight languages: English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai. While trained on a broader range of languages, these core languages represent the model's primary focus for dialogue and instruction-following tasks. The model's knowledge cutoff extends to December 2023, making it one of the more current large language models available.
The model shows significant improvements over its predecessors, including Llama 3.1 8B, 70B Instruct, and 3.1 405B Instruct versions. It excels across various benchmarks, including:
The model can be implemented using the transformers
library (version 4.45.0 or later) through pipelines or the generate()
function. It supports both conversational inference and tool use via chat templates. For memory optimization, users can implement 8-bit and 4-bit quantization using bitsandbytes
. Meta provides comprehensive documentation for tool use and maintains an original llama
codebase version.
The model is distributed under the Llama 3.3 Community License Agreement, which provides non-exclusive, worldwide, non-transferable, royalty-free rights for use, reproduction, distribution, and modification of the Llama Materials. Users must comply with specific conditions, including displaying "Built with Llama" and adhering to the Acceptable Use Policy. Organizations exceeding 700 million monthly active users require a separate license from Meta.
Meta emphasizes responsible deployment through various safeguards including Llama Guard 3, Prompt Guard, and Code Shield. The company maintains a comprehensive responsible use guide and encourages community contributions for safety improvements.