Browse Models
The simplest way to self-host Gemma 2 9B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Gemma 2 9B is a 9 billion parameter decoder-only language model from Google, built on PaLM 2 architecture and trained on 8 trillion tokens. Notable for its optimization potential with 6x faster inference using Torch compile, it performs well on benchmarks like MMLU and HellaSwag relative to its size.
Gemma 2 9B is a lightweight, state-of-the-art decoder-only large language model developed by Google, sharing the same underlying research and technology as the larger Gemini models. Built on the Pathways Language Model (PaLM 2) architecture, it represents a significant advancement in making powerful language models accessible for resource-constrained environments.
The model was trained on an extensive dataset of approximately 8 trillion tokens, encompassing web documents, code, and mathematical text. The training data underwent rigorous preprocessing, including CSAM filtering and sensitive data removal to ensure safety and ethical considerations. The model's weights are available in bfloat16 precision, with the option to load in float32 (though this doesn't improve precision).
A key technical feature is the model's optimization capabilities - inference speed can be improved up to 6x using Torch compile. The model implements a chat template for conversational use, which can be easily applied through the tokenizer's apply_chat_template
method.
Gemma 2 9B's performance has been extensively evaluated across multiple benchmarks, including:
While specific numerical results aren't provided, the model demonstrates strong performance across these benchmarks, though slightly lower than its larger 27B parameter sibling. This performance-size tradeoff makes it particularly valuable for scenarios where computational resources are limited but high-quality language processing is required.
The model is designed for various text generation tasks, including:
Ethical considerations have been thoroughly addressed through structured evaluations and internal red-teaming, focusing on:
The model has been verified to meet internal safety policies, though users should be aware of certain limitations, including: