Launch a dedicated cloud GPU server running Laboratory OS to download and run Gemma 3 27B using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.
The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.
Model Report
Google / Gemma 3 27B
Gemma 3 27B is a multimodal generative AI model developed by Google DeepMind that processes both text and image inputs to produce text outputs. Built on a decoder-only transformer architecture with 27 billion parameters, it incorporates a SigLIP vision encoder and supports context lengths up to 128,000 tokens. The model was trained on over 14 trillion tokens and demonstrates competitive performance across language, coding, mathematical reasoning, and vision-language tasks.
Explore the Future of AI
Your server, your data, under your control
Gemma 3 27B is a multimodal generative AI model developed by Google DeepMind as part of the open-weight Gemma model family. Designed to process both text and image inputs, with outputs in text form, Gemma 3 27B is based on the same advances in architecture, efficiency, and training as the Gemini line of models. Released in 2025, it is available as both pre-trained and instruction-tuned variants, aimed at research and development communities seeking state-of-the-art performance in natural language processing and visual-language understanding.
Example of a high-resolution image input to Gemma 3 27B, used in a demonstration of detailed image description. Prompt: 'Describe this image in detail.'
Gemma 3 27B employs a decoder-only transformer architecture, integrating key recent innovations in large language models. The model's multimodal capacity is enabled by the integration of a 400-million parameter SigLIP vision transformer encoder, which converts input images—resized to 896x896 pixels—into soft visual tokens processed by the language model. During training, this vision encoder is kept frozen and is shared across all larger Gemma 3 variants (4B, 12B, and 27B parameters).
To efficiently manage visual data at inference time, Gemma 3 27B uses a Pan & Scan strategy, segmenting images into non-overlapping crops and normalizing them for diverse aspect ratios and resolutions. Images are embedded into a fixed set of 256 vectors, significantly reducing computational cost. The model supports context lengths up to 128,000 tokens, utilizing a combination of local and global self-attention layers for efficient long-context reasoning. Global layers, interleaved at a 1:5 ratio with local layers, employ a high-frequency RoPE positional encoding enabling models to process long sequences more efficiently.
For text, Gemma 3 27B uses a SentencePiece tokenizer with a 262,000-token vocabulary, supporting more balanced and effective encoding across over 140 languages.
Training Data, Optimization, and Techniques
Gemma 3 27B was pre-trained on a data corpus exceeding 14 trillion tokens, drawing from diverse sources such as web documents, programming code, academic mathematics content, and a significant mixture of multilingual data—including both monolingual and parallel sources—to enhance language coverage. The data mixture and filtering are designed to optimize for safety, decontamination, and performance across text and image modalities, employing strategies for removing sensitive information and maximizing representation for under-represented languages. Quality weighting, inspired by recent best practices, further refines the data selection process.
The training process employs knowledge distillation, where a student model is trained on soft distributions from a high-capacity teacher, improving the student's learning and output quality. A novel post-training procedure integrates reinforcement learning (using refined versions of BOND, WARM, and WARP algorithms), human feedback, and execution-based scoring for tasks such as code and math, enhancing the model's ability to follow instructions, solve mathematical problems, and reason across disciplines. Quantization-aware training enables the production of both bf16 and lower-precision (int4, SFP8) checkpoints, extending versatility for deployment.
Gemma 3 27B demonstrates robust performance improvements over prior iterations and remains competitive with models of similar and larger scale. In the LMSYS Chatbot Arena, Gemma 3 27B-IT achieved an Elo score of 1338, ranking in the top 10 and surpassing other open models such as DeepSeek-V3 and LLaMA 3 405B. These assessments, however, capture only text-based performance and do not measure multimodal capabilities.
On a wide range of established benchmarks, Gemma 3 27B achieves the following results:
On MMLU-Pro, it scores 67.5, marking a substantial gain from its predecessor and approaching proprietary models like Gemini 1.5 Pro.
In code generation and reasoning benchmarks, including LiveCodeBench (29.7) and MATH (89.0), Gemma 3 27B consistently outperforms Gemma 2 27B.
For visual-language tasks, the model reaches high scores on COCO Caption, DocVQA, and TextVQA, reflecting improvements resulting from its high-resolution SigLIP vision encoder and advanced Pan & Scan inference method. For example, performance improvements on DocVQA (+4.8), InfoVQA (+17.0), and TextVQA (+1.6) are attributed directly to this architecture.
Multilingual assessment demonstrates enhanced coverage, with improved results on benchmarks such as MGSM, Global-MMLU-Lite, and WMT24++, reflecting the model's revised and diversified training mixture.
Long-context reasoning remains effective up to 128,000 tokens, though accuracy diminishes rapidly at the upper context limits.
A sample image used with Gemma 3 27B to demonstrate visual question answering. Prompt: 'What animal is on the candy?'
Gemma 3 27B is designed for a broad spectrum of tasks at the intersection of language and vision. In natural language processing, it is suited for creative text generation, conversational agents, summarization, question answering, code generation, and mathematical reasoning. Multimodal capabilities enable extraction and interpretation of information from images—such as running OCR-style tasks, answering questions about document layouts, or summarizing visual scenes.
The model is also applicable to research in language modeling, multilingual NLP, and multimodal AI, providing a strong foundation for experimentation and further development. Tools such as ShieldGemma 2, trained with the Gemma 3 architecture, extend the model family's utility to domains like image safety classification, outputting labels relevant to content safety and moderation.
Limitations and Considerations
While Gemma 3 27B advances in numerous technical areas, several limitations remain. Despite rigorous data filtering, there is residual risk of contamination in evaluation probes, complicating some comparative analyses. The model's factual accuracy is grounded in the statistical patterns of the data it was trained on, and it is not a source of verified knowledge; as such, it can generate outdated or incorrect statements. Challenges persist around common sense reasoning and subtle language nuance, and like other large language models, it can manifest and propagate socio-cultural biases present in real-world datasets. Safety evaluations have primarily centered on English, and performance in sensitive domains (such as CBRN knowledge) is limited.
Release Information and Licensing
Gemma 3 27B and related models were made public with the Gemma 3 Technical Report dated March 12, 2025. Users are required to review and consent to Google's usage license, with specific usage restrictions detailed in the Gemma Prohibited Use Policy. These terms and further guidance regarding responsible generative AI use can be found through Google's official resources.
Related Models and Comparisons
The Gemma 3 family includes models at 1B, 4B, 12B, and 27B parameter scales. All but the 1B variant incorporate the shared SigLIP vision encoder, and only the larger models support the extended 128K-token context window. Gemma 3 models consistently outperform their Gemma 2 counterparts across linguistic, coding, reasoning, and vision-language assessments, and the smallest instruction-tuned Gemma 3 (4B-IT) is competitive with the much larger Gemma 2 27B-IT model.