Browse Models

stabilityai /

ControlNet SDXL Depth

Family

Stable Diffusion XL

Type

ControlNet Model

License

CreativeML Open RAIL-M License

Released

2023-08-29

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run ControlNet SDXL Depth using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

comfyanonymous /

ComfyUI

Generate images and videos using a powerful low-level workflow graph builder - the fastest, most flexible, and most advanced visual generation UI.

lllyasviel /

Stable Diffusion WebUI Forge

Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.

Automatic1111 /

Stable Diffusion Web UI

Automatic1111's legendary web UI for Stable Diffusion, the most comprehensive and full-featured AI image generation application in existence.

lllyasviel /

Fooocus

Simple, intuitive, and powerful image generation. Easily inpaint, outpaint, and upscale. Influence the generation using image prompts.

bmaltais /

Kohya's GUI

Train your own LoRAs and finetunes for Stable Diffusion and Flux using this popular GUI for the Kohya trainers.

Model Report

stabilityai / ControlNet SDXL Depth

ControlNet SDXL Depth is a conditional control model that enables depth map-guided image generation using the Stable Diffusion XL framework. The model processes depth information from various sources including MiDaS, Leres, and Zoe Depth methods to constrain image synthesis according to spatial relationships and three-dimensional scene structure, allowing users to generate images that adhere to specific geometric arrangements while maintaining the base diffusion model's generative capabilities.

Explore the Future of AI

Your server, your data, under your control

Model Architecture and Operation

This model is grounded in the ControlNet 1.1 architecture, closely following the network design of ControlNet 1.0. The model functions as a conditional control system layered upon the Stable Diffusion framework, providing additional input channels that align the generative process with external structural cues, particularly those encoded in depth maps.

Depth maps, which may originate from monocular depth estimation algorithms or real-world 3D renderers, are preprocessed and entered into the model. The depth-aware conditioning enables the diffusion process to respect the foreground-background relationships, occlusion order, and relative distances expressed in the map. This is achieved without architectural divergence from earlier ControlNet versions, ensuring model stability and reproducibility as the developers deliberately deferred major changes until future releases.

ControlNet's codebase is primarily implemented in Python, facilitating modifiability and integration within established diffusion pipelines.

Training Data and Methodology

The depth-specific variant of ControlNet 1.1 was trained using a multi-source dataset that amalgamates depth maps produced by MiDaS, Leres, and Zoe Depth methods. Training incorporated data augmentation strategies, including random left-right flipping, and utilized depth maps across multiple input resolutions (256, 384, and 512 pixels) to bolster generalization to different scale and source variations.

Key improvements over prior models address several data quality challenges. Corrections to the training set eliminated duplicated grayscale images and low-quality samples, while also refining the correspondence between images and textual prompts. These enhancements yielded a model that is not tuned to any singular depth estimation method, resulting in robust performance even when driven by depth maps from novel sources or varying preprocessing pipelines.

To further harness resource efficiency, parallel developments such as Control-LoRA's MiDaS and ClipDrop Depth models have demonstrated that low-rank adaptation techniques can produce compact models capable of similar depth-based guidance, using training inputs like MiDaS dpt_beit_large_512 and finetuning with ClipDrop's Portrait Depth Estimation.

Applications and Use Cases

This model principally serves to guide the image generation capabilities of Stable Diffusion 1.5 according to the constraints imposed by input depth maps. The resulting system is able to generate images consistent with the geometric arrangement and spatial relationships of objects specified by the depth input. This is particularly useful in tasks requiring fidelity to three-dimensional scene layout, such as the generation of photorealistic portraits, interior renderings, and arbitrary scene synthesis from sketched or programmatically derived depth cues.

Depth-based control is one modality among several in the broader ControlNet 1.1 family of models, which encompasses additional control types including edge, normal map, scribble, line art, soft edge, segmentation, pose (OpenPose), inpainting, and stylization controls.

Comparison within the Model Family

The depth-specific ControlNet 1.1 model is one of a suite of specialized models designed to provide conditional image generation through a diversity of input modalities. Other prominent models within the same release manage control via Canny edge detection, Bae's normal map estimation, scribble and line art inputs, semantic segmentation, and OpenPose outputs.

Efforts towards more resource-efficient deployment are realized in the form of Control-LoRA variants, which employ low-rank adaptation to reduce model size from approximately 4.7GB to as little as 377MB. This enables depth, edge, and stylization controls to be accessible on consumer hardware with a reduced computational footprint. These LoRA-based models maintain the core functionalities of their larger ControlNet counterparts, offering similar user control while facilitating broader accessibility.

Limitations and Considerations

While this model generally provides robust depth conditioning, certain modalities in the ControlNet family—such as the experimental instruct-based ip2p and the stylization-oriented shuffle—may require user discretion and iterative refinement for optimal results. Additionally, some specialized models like the anime-focused line art variant (control_v11p_sd15s2_lineart_anime.pth) are contingent on specific model checkpoints and do not support all operational modes present in other controls.

The license governing this model and its family of models has not been explicitly detailed in the available documentation.

ControlNet SDXL Depth

Laboratory OS

Direct Download

ComfyUI

Stable Diffusion WebUI Forge

Stable Diffusion Web UI

Fooocus

Kohya's GUI

Explore the Future of AI

Your server, your data, under your control

ControlNet SDXL Depth

Laboratory OS

Direct Download

ComfyUI

Stable Diffusion WebUI Forge

Stable Diffusion Web UI

Fooocus

Kohya's GUI

Explore the Future of AI

Your server, your data, under your control

Model Architecture and Operation

Training Data and Methodology

Applications and Use Cases

Comparison within the Model Family

Limitations and Considerations

External Resources