Browse Models

Cohere /

Command R+ v01

Family

Command R

Type

Foundation Model

License

CC-BY-NC 4.0 License

Released

2024-04-03

How To Use

Note: Command R+ v01 weights are released under a CC-BY-NC 4.0 License, and cannot be utilized for commercial purposes. Please read the license to verify if your use case is permitted.

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Command R+ v01 using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

open-webui /

Open WebUI

Open WebUI is an open-source, self-hosted web interface with a polished, ChatGPT-like user experience for interacting with LLMs. Integrates seamlessly with local Ollama installation.

oobabooga /

Text Generation Web UI

The most full-featured web interface for experimenting with open source Large Language Models. Featuring a wide range of configurable settings, inference engines, and plugins.

Model Report

Cohere / Command R+ v01

Command R+ v01 is a 104-billion parameter open-weights language model developed by Cohere, optimized for retrieval-augmented generation, tool use, and multilingual tasks. The model features a 128,000-token context window and specializes in generating outputs with inline citations from retrieved documents. It supports automated tool calling, demonstrates competitive performance across standard benchmarks, and includes efficient tokenization for non-English languages, making it suitable for enterprise applications requiring factual accuracy and transparency.

Explore the Future of AI

Your server, your data, under your control

Command R+ is a large-scale generative language model developed by Cohere and Cohere Labs, and is an open-weights research release with 104 billion parameters. Designed for enterprise and research applications, Command R+ is designed for Retrieval-Augmented Generation (RAG), tool-use automation, code interaction, and multilingual tasks, seeking to provide efficiency and accuracy for production-scale deployments. The model supports extensive context windows and features to facilitate grounded, transparent, and reliable outputs across a wide range of domains.

Technical Capabilities

Command R+ is built as an auto-regressive transformer-based language model featuring a context window of 128,000 tokens. The model architecture combines extensive pretraining with supervised fine-tuning and preference alignment to optimize for tasks requiring factual accuracy and safe, helpful behavior. Command R+ operates in a text-in, text-out paradigm, accepting and generating only textual content.

A distinguishing aspect of Command R+ is its optimization for Retrieval-Augmented Generation (RAG). The model can integrate retrieved document snippets into its outputs, providing in-line citations to improve factual grounding and minimize hallucinations. It supports two modes for grounded generation: an "accurate" mode that sequences document selection, citation prediction, and span insertion, and a "fast" mode that expedites output with reduced grounding precision.

Advanced tool-use capabilities enable automation by allowing the model to select, call, and interact with external functions—typically APIs, databases, or search engines. Command R+ can perform both single-step (function calling) and multi-step tool use (agent workflows), leveraging iterative action-observation-reflection cycles to accomplish complex, multi-action tasks.

The model was further developed for multilingual coverage, showing evaluated performance in ten languages, with pretraining data spanning over twenty languages. Its tokenizer is specifically engineered to reduce operational costs for non-English content by encoding such text with fewer tokens relative to other models.

Additionally, Command R+ is suitable for code-related tasks, capable of generating, explaining, and rewriting code snippets, especially when using low-temperature or greedy decoding strategies.

LLM performance and price comparison charts

Model Performance

Command R+ has been evaluated across standardized benchmarks and specialized human assessments, highlighting its strengths in RAG, tool use, and multilingual understanding. On the Open LLM Leaderboard, Command R+ demonstrates competitive results for academic metrics such as ARC Challenge, HellaSwag, MMLU, Winogrande, GSM8k, and TruthfulQA, with an average benchmark score of 74.6.

Human evaluations substantiate the model’s RAG capabilities. Command R+ demonstrates performance in tasks requiring document-grounded summarization with citations, as measured against other evaluated models in head-to-head preference assessments. These evaluations not only consider citation quality but also rate overall text fluency and utility for open-ended generation.

Human RAG preference and multi-hop reasoning charts

Analyses using ToolTalk (Hard) and the Berkeley Function Calling Leaderboard indicate proficiency in conversational tool use and single-turn function calling. These assessments measure the model's rate of successful function execution within dynamic agent environments.

Conversational agent and function calling evaluation

Command R+ also exhibits multilingual translation performance as evaluated on FLoRES and WMT23 tasks, demonstrating competitive BLEU scores for both English-to-L2 and L2-to-English scenarios over a range of languages.

The model’s tokenizer demonstrates efficiency advantages for non-English applications, reducing required token counts—and thus operational costs—relative to Mistral and OpenAI tokenizers for various world languages.

Multilingual token cost comparison bar chart

Training Methodology and Datasets

Command R+ leverages an optimized large transformer architecture, pretrained on vast corpora encompassing multiple domains and languages. Its training pipeline integrates supervised fine-tuning (SFT) and preference training, which incorporates human feedback to enhance model helpfulness, factuality, and safety. The model’s RAG and tool-use specializations are a result of targeted fine-tuning, equipping it for document-grounded tasks and robust function-calling protocols.

Extensive multilingual support is achieved through both pretraining—drawing from resources covering more than twenty languages—and tokenizer engineering, yielding improved tokenization efficiency for non-Latin scripts and underrepresented languages.

Applications and Use Cases

Command R+ is tailored for enterprise tasks and research applications where reliability and factuality are considerations. Primary use cases include open-domain and document-grounded question answering, complex summarization with cited sources, agent-based workflows involving external tool invocation, and textual interactions involving code review, generation, and rewriting.

Its multilingual capabilities allow for deployment in global, cross-lingual environments. The model also supports automation scenarios, such as updating CRM records or orchestrating business workflows requiring sequential tool calls and reasoning steps. By supporting RAG with accurate, retrievable source citations, Command R+ can be used for applications demanding transparency and verifiability in responses.

Model Family, Release, and Licensing

Command R+ is part of the wider Command R family, which includes models such as Command R, and lighter quantized variants for specialized deployment needs. The model and its weights are released under a Creative Commons Attribution-NonCommercial (CC-BY-NC) License, with usage subject to the Cohere Labs Acceptable Use Policy. The initial release was announced in April 2024, with ongoing improvements leading to newer versions such as Command R+ 08-2024.

For those requiring quantization, Command R+ is also available in 4-bit format to balance performance with memory efficiency.

Limitations

While Command R+ achieves results across RAG, tool use, and multilingual benchmarks, certain limitations are noted. Code completion may be less optimal, and standard LLM benchmarks do not comprehensively reflect RAG or tool-use capabilities. To preserve performance, strict adherence to specified prompt and chat templates for task-specific generation is recommended.