Browse Models
The simplest way to self-host Vicuna 7B. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Vicuna-7B is a conversational AI model fine-tuned from LLaMA using 125,000 ShareGPT conversations. It features extended context length (2048 tokens) and achieves strong performance on dialog tasks. Notable for its multi-turn conversation handling and improved training methodology over Stanford Alpaca.
Vicuna-7B is an open-source chatbot created by the LMSYS team through fine-tuning the LLaMA language model. As detailed in the original announcement, it represents a significant advancement in accessible, high-quality language models, achieving impressive performance while maintaining reasonable training costs.
The model is based on the transformer architecture and was initially developed as a fine-tuned version of LLaMA, later updated to use Llama 2 as its base for version 1.5. The training process utilized approximately 125,000 conversations sourced from ShareGPT.com, representing a substantial improvement over previous approaches. The training configuration included:
The total training cost was approximately $140, making it an economically viable option for research and development. The training process improved upon the Stanford Alpaca methods by better handling multi-turn conversations and increasing the maximum context length from 512 to 2048 tokens.
The model's performance has been extensively evaluated through multiple methodologies, as detailed in the evaluation paper. A preliminary evaluation using GPT-4 as a judge showed that Vicuna achieved over 90% of the quality of ChatGPT and Google Bard, surpassing both LLaMA and Stanford Alpaca in over 90% of cases.
The evaluation framework included:
The research demonstrated that strong LLMs like GPT-4 can accurately reflect human preferences in evaluating other LLMs, achieving over 80% agreement with human judgments. This validation approach has proven both scalable and explainable, though the authors acknowledge certain limitations in Vicuna's performance on reasoning, mathematics, and factual accuracy tasks.
Vicuna is available in both 7B and 13B parameter versions, with the 13B variant showing slightly superior performance in most benchmarks. The models are primarily intended for research into LLMs and chatbots, targeting researchers and hobbyists in AI and NLP fields.
The project includes a lightweight distributed serving system capable of handling multiple models and utilizing both on-premise and cloud-based GPU workers. For safety considerations, the online demo employs OpenAI's moderation API to filter inappropriate inputs.
The model is available under the Llama 2 Community License Agreement, and while the code and model weights are publicly available for non-commercial use, the training data itself (the ShareGPT conversations) is not publicly released.