Browse Models
The simplest way to self-host ControlNet SDXL Depth. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
ControlNet SDXL Depth enables spatial control over image generation by incorporating depth map information. It processes grayscale depth maps from multiple sources (Midas, Leres, Zoe) to understand object distances and positioning. Available in full (4.7GB) and LoRA variants (377MB-738MB) for different resource needs.
ControlNet SDXL Depth represents a significant advancement in controlled image generation, building upon the foundation established by the original ControlNet architecture. This model specializes in incorporating depth information into the image generation process, allowing for precise control over the three-dimensional aspects of generated images.
The model maintains the core ControlNet architecture while introducing specific optimizations for depth-based control. It was trained on a comprehensive dataset combining multiple depth map sources, including Midas, Leres, and Zoe depth maps, at various resolutions (256, 384, and 512 pixels). This diverse training approach, incorporating multiple depth map generators and resolutions, contributes to the model's robust performance across different input types.
Two main variants exist: the full ControlNet implementation and the Control-LoRA version. The Control-LoRA variant, developed by Stability AI, offers a more efficient alternative at approximately 738MB (Rank 256) or 377MB (Rank 128), compared to the original 4.7GB model size. This version was specifically trained on depth results from the MiDaS dpt_beit_large_512
model and further refined using Stability AI's Portrait Depth Estimation technology, as detailed in the Control-LoRA documentation.
ControlNet SDXL Depth excels in manipulating image generation based on grayscale depth maps, which provide crucial information about object distances from the camera viewpoint. The model supports multiple depth map preprocessors:
Each preprocessor offers slightly different results, allowing users to choose the most appropriate option for their specific use case. The model demonstrates particular robustness when working with real depth maps from rendering engines, as noted in the ControlNet v1.1 documentation.
The model can be integrated into various frameworks, with particular support for Automatic1111's Stable Diffusion WebUI through the sd-webui-controlnet extension. For optimal performance, specific VRAM configurations are recommended:
--medvram-sdxl
for systems with 8GB-16GB VRAM--lowvram
for systems with less than 8GB VRAMMultiple model variants are available for different use cases:
diffusers_xl_depth_full.safetensors
diffusers_xl_depth_mid.safetensors
diffusers_xl_depth_small.safetensors
These models can be placed in either stable-diffusion-webui\extensions\sd-webui-controlnet\models
or stable-diffusion-webui\models\ControlNet
directories for implementation.