Browse Models
The simplest way to self-host Vicuna 13B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Vicuna-13B is a conversational AI model fine-tuned from LLaMA using 70,000+ ShareGPT conversations. It handles 2048-token contexts and performs at ~90% of ChatGPT's level in GPT-4 evaluations. Notable for multi-turn dialogue capability and efficient memory handling through gradient checkpointing and flash attention.
Vicuna-13B is a large language model chatbot developed by LMSYS, created by fine-tuning the LLaMA base model. The model leverages the transformer architecture, specifically using the LlamaForCausalLM architecture, and was designed primarily for research purposes in the areas of LLMs and chatbot development. The project represents a significant advancement in open-source language models, achieving impressive performance metrics while maintaining accessibility for researchers and developers.
The model was trained through supervised instruction fine-tuning on approximately 125,000 conversations sourced from ShareGPT.com, where users share their ChatGPT conversations. This approach improved upon the Stanford Alpaca method by better handling multi-turn conversations and longer sequences, increasing the maximum context length from 512 to 2048 tokens.
The training process utilized several advanced optimization techniques:
The training was conducted using 8 A100 GPUs, with the total cost approximately $300 - a remarkably efficient figure achieved through the use of spot instances and optimization techniques. The training code and model weights are publicly available through the FastChat GitHub repository.
Preliminary evaluations using GPT-4 as a judge indicated that Vicuna-13B achieves over 90% of the quality of OpenAI's ChatGPT and Google Bard, surpassing the base LLaMA model and Stanford Alpaca in over 90% of cases. This evaluation, detailed in a comprehensive research paper, involved comparing responses across 80 diverse questions, with GPT-4 judging based on helpfulness, relevance, accuracy, and detail.
Key capabilities include:
However, the model does have some limitations, particularly in:
The latest version, Vicuna-13b-v1.5, represents an evolution of the original model, fine-tuned from Llama 2 using an expanded dataset of approximately 125,000 conversations. This version has shown improved performance across various benchmarks, as demonstrated in the MT-bench and Chatbot Arena evaluations.
The model family includes both 7B and 13B parameter versions, with the 13B version generally showing superior performance across most metrics. Evaluation results are regularly updated on the Chatbot Arena leaderboard, providing transparent comparison with other models in the field.
The model is available under the Llama 2 Community License Agreement, with the code released under the Apache License 2.0. The online demo's usage is subject to additional terms including the LLaMA license, OpenAI's terms of use, and ShareGPT's privacy practices. While the model weights and code are publicly available, the training dataset has not been released publicly.