Browse Models
The simplest way to self-host ControlNet SDXL Diffusers Depth. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
ControlNet SDXL Diffusers Depth enables depth-based control over Stable Diffusion XL image generation. It processes depth maps from multiple sources (Midas, Leres, Zoe) at various resolutions, using smart resampling for consistent results. The Control-LoRA variant reduces model size to 377MB while maintaining functionality.
ControlNet SDXL Diffusers Depth is a specialized AI model designed to add conditional control to Stable Diffusion XL (SDXL) image generation through the use of depth maps. It represents a significant advancement in controlled image generation, building upon the architecture of the ControlNet family while specifically focusing on depth-based image manipulation.
The model exists in multiple variants: diffusers_xl_depth_full.safetensors
, diffusers_xl_depth_mid.safetensors
, and diffusers_xl_depth_small.safetensors
. These variants offer different size-performance tradeoffs, making the technology accessible to users with varying computational resources. The model was trained on a comprehensive dataset combining depth maps from multiple sources, including Midas, Leres, and Zoe depth maps at multiple resolutions (256, 384, and 512 pixels).
The architecture employs data augmentation techniques to improve robustness and reduce overfitting to specific depth estimation methods. This includes random left-right flipping and other augmentation strategies that help create a more generalized model. The training process utilized an unbiased dataset approach, which helped minimize the model's tendency to overfit to particular depth estimation techniques.
The model processes depth maps, which are grayscale images representing distance from the camera, through specialized preprocessors (annotators) before feeding them into the ControlNet model. Several depth preprocessors are available, including:
depth_midas
depth_zoe
depth_leres++
depth_leres
Each preprocessor offers different approaches to depth information extraction and representation, affecting the final output quality. The model supports multiple control modes:
These modes allow users to fine-tune the balance between the text prompt and the depth map's influence on the generated image. As detailed in the Civitai guide, users can adjust parameters such as "Control Weight," "Starting Control Step," and "Ending Control Step" to precisely control the depth map's influence during the diffusion process.
The model is supported by the sd-webui-controlnet
extension (version 1.1.400 and later) and has been tested across various VRAM configurations. For optimal performance:
--medvram-sdxl
--lowvram
The extension implements a smart resampling algorithm that ensures pixel-perfect control images regardless of resolution. Multiple ControlNet instances can be used simultaneously, allowing for complex control over image generation through the combination of different control types.