Stable Video 3D

Family

Stable Video Diffusion

Type

Foundation Model

License

Stability AI Non-Commercial Research Community License

Released

2024-03-18

How To Use

Note: Stable Video 3D weights are released under a Stability AI Non-Commercial Research Community License, and cannot be utilized for commercial purposes. Please read the license to verify if your use case is permitted.

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run Stable Video 3D using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

comfyanonymous /

ComfyUI

Generate images and videos using a powerful low-level workflow graph builder - the fastest, most flexible, and most advanced visual generation UI.

Model Report

stabilityai / Stable Video 3D

Stable Video 3D is a generative model developed by Stability AI that creates orbital videos from single static images, generating 21-frame sequences at 576x576 resolution that simulate a camera rotating around objects. Built on Stable Video Diffusion architecture and trained on Objaverse 3D renderings, it offers two variants: SV3D_u for autonomous camera paths and SV3D_p for user-specified trajectories.

Explore the Future of AI

Your server, your data, under your control

Stable Video 3D (SV3D) is a generative artificial intelligence model developed by Stability AI that specializes in creating orbital videos from a single static image. Building upon the Stable Video Diffusion (SVD) Image-to-Video architecture, SV3D is designed to synthesize short videos that depict an object as if a camera is smoothly circling around it, thereby simulating a 3D view from 2D data. The model enables the transformation of a still image into a multi-perspective visual sequence, which has applications in areas such as visualization, virtual environments, and digital content creation.

An orbital video generated from a still image using SV3D, showing a 360-degree view of a yellow-orange rubber duck.

Technical Capabilities

SV3D is engineered to generate short orbital videos, typically consisting of 21 frames at a resolution of 576x576 pixels, by conditioning on a single input image of an object. The model introduces two primary variants with distinct control properties: SV3D_u, which autonomously determines the camera path to create an orbital video around the subject without external guidance, and SV3D_p, which accepts specified camera trajectories, affording users enhanced control over the viewpoint sequencing and path of the virtual camera.

Both variants are capable of producing coherent object-centric videos that simulate a smooth rotation around the input subject, contributing to enhanced visualizations and interactive digital experiences. The system is fine-tuned from the base SVD Image-to-Video model, establishing continuity and improvement in Stability AI's video diffusion capabilities.

Model Architecture

The architecture of SV3D is based on the diffusion model approach introduced by Stable Video Diffusion, which utilizes probabilistic sampling to incrementally refine noisy video predictions into realistic video frames. SV3D adapts this architecture to operate with both a single image as input and, for SV3D_p, an explicit camera path specification. This adaptation allows the model to learn complex 3D-aware video representations, simulating novel views of the same object consistent with a camera's motion.

Details of the model's architecture, including network configurations and conditioning mechanisms, are described in the official SV3D technical report and the associated project page, ensuring scientific transparency and reproducibility.

Datasets and Training Procedure

The SV3D model is trained using renderings derived from the Objaverse 1.0 dataset, a large-scale collection of 3D models with diverse objects. Stability AI employed an enhanced rendering process to recreate the characteristics and distribution of real-world images, improving the model's generalization ability. The training data was further refined to maximize fidelity and diversity while adhering to the CC-BY license, facilitating responsible and open research.

This training methodology enables SV3D's capability to generate multi-angle video sequences from a single visual reference.

Applications and Use Cases

SV3D is tailored for applications requiring the creation of short, object-centric orbital videos from static images. Primary use cases involve generating 3D-like representations for virtual and augmented reality content, product visualizations in digital marketplaces, and creative content development where multi-angle object views are desirable. The model's output can facilitate rapid prototyping and visualization in contexts where acquiring complete 3D scans or multiple photographs would be impractical.

Limitations and Ethical Considerations

SV3D's outputs are the result of generative processes conditioned on static images and, optionally, a specified camera path. As such, the model is not designed to produce factual or verifiable representations of specific real-world objects, events, or persons. Its use is limited to creative and illustrative purposes and must align with the Stability AI Acceptable Use Policy. Additionally, factual accuracy with respect to real-world identities or scenarios is not guaranteed due to the synthetic nature of the output.

SV3D is released under the Stability AI Community License, which governs its use for research and non-commercial activities. Separate licensing is required for commercial applications, as detailed in the commercial license information.

Further Resources

For more comprehensive technical documentation, reference implementations, and project updates, the following resources are available: