LLaMA 33B

Family

Llama 1

Type

Foundation Model

License

LLaMA License Agreement

Released

2023-02-24

How To Use

Note: LLaMA 33B weights are released under a LLaMA License Agreement, and cannot be utilized for commercial purposes. Please read the license to verify if your use case is permitted.

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run LLaMA 33B using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

Meta / LLaMA 33B

LLaMA 33B is a 32.5 billion parameter transformer-based language model developed by Meta AI as part of the LLaMA family. The model employs architectural enhancements including RMSNorm pre-normalization, SwiGLU activation functions, and rotary positional embeddings. It was trained on over 1.4 trillion tokens from publicly available datasets and demonstrates competitive performance across various language modeling and reasoning benchmarks while being released under a noncommercial research license.

Explore the Future of AI

Your server, your data, under your control

LLaMA 33B is a member of the Large Language Model Meta AI (LLaMA) family, a series of foundational language models developed by Meta AI. Publicly introduced in 2023, the LLaMA models aim to broaden access to high-quality language modeling by providing models of various parameter sizes that perform competitively while enabling research with more modest computational resources. LLaMA 33B specifically encompasses 32.5 billion parameters and has become a reference point in the landscape of open large language models.

Model Architecture

LLaMA 33B, like its sibling models, is built upon the transformer-based architecture that underpins most contemporary state-of-the-art language models. Notably, LLaMA 33B employs several enhancements that aim to improve training stability and efficiency. Pre-normalization, utilizing the RMSNorm function, is applied to transformer sub-layers to stabilize the optimization process. Additionally, the SwiGLU activation function is used instead of the canonical ReLU, with a variant dimensionality adjustment for feed-forward layers to improve representational capacity.

One of the key distinguishing features of LLaMA models—including the 33B variant—is the replacement of absolute positional embeddings with rotary positional embeddings (RoPE) in every layer. This change facilitates better modeling of sequence order without increasing computational overhead.

Advanced implementation techniques further contribute to the model's efficient operation. The LLaMA 33B leverages efficient causal multi-head attention mechanisms to manage large context windows and reduce memory demands, paired with strategies such as activation checkpointing and optimized data parallelism to accelerate training across distributed hardware setups.

Training Data and Methods

LLaMA 33B was trained on an extensive corpus exceeding 1.4 trillion tokens, comprising only publicly available datasets, in contrast to several other language models that partially rely on proprietary content. Major components of this dataset include multi-year CommonCrawl web snapshots, the C4 corpus, filtered GitHub code, Wikipedia in 20 languages, Gutenberg books, scientific articles from arXiv, and curated Stack Exchange question-answer data. Each data source underwent deduplication, filtering, and language normalization to maximize corpus quality and diversity.

The input data was processed using byte-pair encoding (BPE) tokenization. Numeric strings are split into individual digits; infrequent and non-standard UTF-8 characters are subdivided into byte-level tokens. Model training followed a single-pass strategy for most corpora, except for Wikipedia and books, for which two epochs were performed to increase exposure to high-quality textual resources.

The model was trained using the AdamW optimizer, cosine learning rate scheduling, and regularization techniques such as gradient clipping. Additional optimizations included efficient implementations of transformer backward passes and coordinated model parallelism to ensure rapid throughput across computational clusters. Notably, the LLaMA 33B's scale and training methodology were chosen to balance resource demands with performance gains, aligning with the project's intent to foster broad-access research.

Evaluation and Performance

LLaMA 33B has been empirically evaluated across a spectrum of language modeling, reasoning, and comprehension tasks. In zero-shot and few-shot settings, the model demonstrates competitive performance on major benchmarks, including BoolQ, PIQA, SIQA, and HellaSwag, along with knowledge-intensive tasks such as NaturalQuestions, TriviaQA, RACE, and MMLU.

Performance in code generation tasks is captured by HumanEval and MBPP, where LLaMA 33B achieves notable scores at both “pass@1” and “pass@100” evaluation settings. Mathematical reasoning assessments, using datasets such as MATH and GSM8k, illustrate relative strengths and ongoing challenges typical of language models of its class.

While the LLaMA 33B model often matches or supersedes the performance of significantly larger models on specific linguistic and reasoning tasks, certain domains, including social interaction or nuanced common-sense reasoning, continue to present areas for further refinement. These benchmark results provide insights into the model's capabilities, but also underscore the need for domain-specific evaluation and continued assessment as language modeling research advances.

Limitations and Ethical Considerations

Despite its open-data approach and technical strengths, LLaMA 33B exhibits several limitations inherent to large language models. Evaluations with RealToxicityPrompts highlight that LLaMA 33B may generate toxic or biased content, with studies noting a positive correlation between model size and toxicity scores. The use of CrowS-Pairs and WinoGender datasets reveal persisting biases related to sensitive attributes such as religion, gender, and age.

On truthfulness benchmarks such as TruthfulQA, LLaMA models outperform some predecessors in informativeness, but absolute performance remains limited, particularly in distinguishing fabricated or misleading information (“hallucinations”). The model's training on large-scale public datasets, while enhancing transparency, does not fully eliminate the presence of inappropriate, erroneous, or offensive material it may have ingested.

Conscious of environmental impacts, the training of LLaMA 33B was reported to consume approximately 233 megawatt-hours of energy, leading to a carbon emission estimate of 90 tonnes CO2-equivalent, based on US averages at the time of publication. Although the model’s release enables further study without retraining, questions of resource efficiency and sustainable AI development persist.

Model Access and Licensing

LLaMA models, including LLaMA 33B, are released under a noncommercial research license. Access is provided selectively, aiming to prioritize academic researchers, institutions, and organizations devoted to scientific advancement and societal good. This licensing approach reflects both Meta AI's commitment to responsible AI research and the practical necessity of managing potential model misuse and risks.

Conclusion

LLaMA 33B serves as a cornerstone within the broader LLaMA model family, offering a detailed case study in the creation of performant, accessible, and transparent large language models. Its training strategy, architectural enhancements, and evaluation outcomes have contributed to a greater understanding of the potential and responsibility associated with state-of-the-art language models in fundamental and applied AI research.