Browse Models
Note: Stable Video 4D weights are released under a Stability AI Non-Commercial Research Community License, and cannot be utilized for commercial purposes. Please read the license to verify if your use case is permitted.
The simplest way to self-host Stable Video 4D. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Stable Video 4D generates multi-view videos from single input videos, producing 40 frames across 8 camera angles. It uses a two-stage approach combining video generation and novel view synthesis, trained on ObjaverseDy data. Output is 576x576 resolution with interpolation for longer sequences.
Stable Video 4D (SV4D) is a groundbreaking generative AI model developed by Stability AI that creates multiple novel-view videos from a single input video. The model represents a significant advancement in dynamic 3D content generation by unifying video generation and novel view synthesis into a single latent diffusion model, as detailed in the technical report.
The architecture builds upon previous models in the family - Stable Video Diffusion (SVD) and Stable Video 3D (SV3D). Given a single-view video input, SV4D first utilizes SV3D to generate an orbital video from the initial frame. This serves as conditioning for SV4D's 4D sampling process, along with the input video frames. The model incorporates specialized view and frame attention blocks to enhance multi-view and temporal consistency.
A key innovation is the model's mixed sampling scheme that enables sequential processing of longer videos while maintaining consistency. The output is a 4D image matrix consisting of 40 frames at 576x576 resolution, comprising 5 video frames across 8 camera views. For longer videos, the initially generated frames serve as anchors for interpolating remaining frames.
SV4D was trained on ObjaverseDy, a carefully curated subset of the Objaverse dataset containing dynamic 3D objects. The training process involved fine-tuning from pre-trained SV3D and SVD models to leverage existing priors. Object selection underwent license review to ensure suitability for training and better reflect real-world image distributions.
The model's performance has been extensively evaluated across multiple datasets including ObjaverseDy, Consistent4D, and DAVIS. Metrics such as LPIPS, CLIP-score, and a modified FVD metric were used to assess both video frame and view consistency. According to the research paper, SV4D achieves state-of-the-art performance compared to baseline models like SV3D, Diffusion 2, STAG4D, Consistent4D, and DreamGaussian4D.
A notable improvement is SV4D's 4D optimization approach, which uses photometry-based techniques instead of computationally expensive score-distillation sampling (SDS) losses used in previous methods. This results in significantly faster generation times - approximately 15-20 minutes per object. The model also employs a spatio-temporal classifier-free guidance (CFG) scaling scheme to enhance output quality.
SV4D's primary applications include artistic creation, design applications, and research into generative models. The model can generate 5-frame videos across 8 views in approximately 40 seconds, with full 4D optimization taking 20-25 minutes. It shows particular promise for applications in game development, video editing, and virtual reality.
Important limitations include that the model is not trained to generate factually accurate representations of people or events - such applications are considered out-of-scope. All usage must comply with Stability AI's Acceptable Use Policy.
The model is released under the Stability AI Community License, which permits free use for research, non-commercial, and commercial purposes for organizations and individuals with annual revenue under US$1,000,000. Organizations exceeding this threshold require an enterprise commercial license from Stability AI.
Image Credits: