Browse Models
The simplest way to self-host ControlNet SDXL Diffusers Canny. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
ControlNet SDXL Diffusers Canny enables structural control over Stable Diffusion XL image generation using edge detection maps. It features three control modes to balance text prompts with edge guidance, and offers both full-size and compressed LoRA variants for efficient deployment.
ControlNet SDXL Diffusers Canny is an edge detection model within the ControlNet framework designed specifically for Stable Diffusion XL (SDXL) image generation. The model processes input images to create "detectmaps" - simplified representations highlighting areas of high contrast through sharp lines. These detectmaps then guide SDXL's generation process, allowing it to maintain the structural composition of input images while enabling creative variations through text prompts.
The model builds upon the architecture established in the original ControlNet research paper, which introduced the concept of adding conditional control to text-to-image diffusion models. Version 1.1 maintains architectural stability (planned through version 1.5) while delivering improved robustness and image quality compared to its predecessor.
The model underwent significant training optimization in version 1.1, utilizing 8 Nvidia A100 80G GPUs over a 3-day period with a batch size of 256, representing a substantial investment of 2160 USD. The training data incorporated various edge maps generated with randomized Canny edge detection thresholds, addressing previous issues such as duplicate images and low-quality data present in version 1.0.
A notable innovation in the model family is the Control-LoRA variant, which applies low-rank adaptation to achieve significant size reduction. While a standard ControlNet model might be 4.7GB, a rank 256 Control-LoRA reduces this to approximately 738MB, with rank 128 versions further reducing size to around 377MB. This efficiency makes the model more accessible for consumer-grade GPUs.
The model's primary strength lies in its ability to precisely guide image generation through edge detection while maintaining flexibility in artistic interpretation. A key feature is its intelligent resampling algorithm, which ensures pixel-perfect accuracy of control images regardless of output resolution - particularly valuable when working with pre-processed or manually created edge maps.
Within the broader ControlNet family, Canny exists alongside several specialized variants:
The model is primarily implemented through the sd-webui-controlnet
extension for Automatic1111's Stable Diffusion WebUI. For optimal performance, users with 8GB-16GB VRAM should use the --medvram-sdxl
command-line flag, while those with less than 8GB should opt for --lowvram
.
The interface offers a "Control Mode" setting with three options:
These settings help users fine-tune the balance between text prompts and control images, particularly useful when addressing "flat" results.