Browse Models
The simplest way to self-host Llama 3 70B. Launch a dedicated cloud GPU server running Laboratory OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Meta's Llama 3 70B (April 2024) features 70 billion parameters and employs Grouped-Query Attention for efficient scaling. Trained on 15 trillion tokens through December 2023, it shows notable improvements in reasoning, dialogue, and code generation compared to previous Llama versions.
Meta's Llama 3 family represents a significant advancement in large language model technology, with the 70B parameter model serving as its flagship offering. Released on April 18, 2024, the model family includes both 8B and 70B parameter variants, each available in pretrained and instruction-tuned versions. The models utilize an optimized transformer architecture with Grouped-Query Attention (GQA) for enhanced inference scalability.
The Llama 3 70B model was trained on over 15 trillion tokens from publicly available data, explicitly excluding Meta user data. The model's knowledge cutoff extends to December 2023, while the smaller 8B variant's cutoff is March 2023. The training process involved Meta's Research SuperCluster and production clusters, resulting in a significant carbon footprint of 2290 tCO2eq, which Meta has fully offset.
The instruction-tuned variants underwent supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance helpfulness and safety. These models excel particularly in dialogue tasks, demonstrating superior performance compared to many existing open-source chat models.
The Llama 3 70B model demonstrates significant improvements over its predecessors across various benchmarks, including MMLU, AGIEval, CommonSenseQA, Winogrande, BIG-Bench Hard, ARC-Challenge, TriviaQA-Wiki, SQuAD, QuAC, BoolQ, and DROP. The instruction-tuned 70B variant particularly shines in tasks requiring complex reasoning and code generation.
The model can be implemented using either the transformers
library or the original llama3
codebase. For the transformers
library, implementation is straightforward using:
transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
Meta has placed a strong emphasis on responsible AI development, implementing comprehensive safety measures including red teaming and adversarial evaluations. The company provides additional safety tools such as Meta Llama Guard 2 and Code Shield to help mitigate potential risks.
The model is available under a custom commercial license, primarily intended for commercial and research use in English, though fine-tuning for other languages is permitted under the license terms.