Launch a dedicated cloud GPU server running Laboratory OS to download and run ControlNet SD 1.5 Segmentation using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.
Train your own LoRAs and finetunes for Stable Diffusion and Flux using this popular GUI for the Kohya trainers.
Model Report
lllyasviel / ControlNet SD 1.5 Segmentation
ControlNet SD 1.5 Segmentation is a neural network that enables precise control over Stable Diffusion 1.5 image generation through semantic segmentation maps. The model supports both COCO and ADE20K segmentation protocols, recognizing over 182 segmentation colors to define object placement and scene composition. It maintains architectural consistency with previous ControlNet releases while offering expanded semantic range and improved robustness for controllable image synthesis applications.
Explore the Future of AI
Your server, your data, under your control
ControlNet SD 1.5 Segmentation is a neural network within the ControlNet 1.1 model suite, designed to provide detailed control over image generation through semantic segmentation maps. The model empowers users to direct the content and structure of images synthesized by Stable Diffusion by specifying segmentation masks as conditional inputs. With enhancements over its predecessor, including broader segmentation protocol support and improved robustness, ControlNet SD 1.5 Segmentation is widely used in research and creative applications focused on precise visual composition.
Diagram illustrating the standardized naming convention used for ControlNet 1.1 models, including SD 1.5 Segmentation.
ControlNet SD 1.5 Segmentation enables Stable Diffusion 1.5 to generate images that closely adhere to input semantic segmentation maps. These maps, often produced by applying automated or manual segmentation protocols, divide an input image into distinct regions based on object categories or scene elements. By supplying such a mask, users achieve granular control over object placement, scene composition, and contextual consistency in the generated outputs.
A key advancement in this version is expanded support for the COCO and ADE20K segmentation protocols. This compatibility allows the model to recognize an increased palette of more than 182 segmentation colors from COCO, alongside continued support for approximately 150 colors from ADE20K, significantly broadening the range of controllable semantic categories. The internal encoder is specifically designed for multi-protocol compatibility, allowing for greater versatility and more comprehensive training with diverse data sources.
Batch test output of ControlNet SD 1.5 Segmentation using ADE20K segmentation protocol. Prompt: 'house', demonstrating adherence to semantic segmentation in image generation.
The model architecture maintained in ControlNet SD 1.5 Segmentation remains consistent with previous ControlNet releases, supporting backward compatibility and predictability for integrators. The primary model file is control_v11p_sd15_seg.pth with the configuration specified in control_v11p_sd15_seg.yaml. The developers have indicated that this architectural consistency will persist at least through version 1.5, streamlining updates and ensuring reliability within the ControlNet framework.
For training, the model employs a continual learning approach, initializing from weights of the previous Segmentation 1.0 version and refining on a merged dataset that includes both COCO and ADE20K semantic segmentation annotations. This approach leverages the diversity of object and scene labels available in these datasets, fostering generalization over a broader set of visual concepts. The ability to incorporate multiple segmentation protocols further enhances performance in heterogeneous real-world scenarios.
Batch test output of ControlNet SD 1.5 Segmentation using COCO segmentation protocol. Prompt: 'house'. The results demonstrate control over architectural features based on the segmentation map.
While the developers have not published formal benchmark metrics, qualitative documentation and openly-shared tests suggest meaningful improvements in both versatility and reliability over the prior release. The incorporation of new segmentation protocols increases the model's semantic range, allowing for finer object separation and a larger array of scene layouts in synthesis. Batch test outputs, generated without cherry-picking, demonstrate that image generation remains tightly aligned with the structures and layouts defined by both ADE20K and COCO segmentation maps.
The model exhibits robustness in maintaining correspondences between segmented input regions and the resulting image's content. Test results indicate consistent translation of segmentation-defined objects and contexts—such as architectural details in house generation—across different seeds and input maps, enabling reproducible and precise outputs for varied use cases.
Applications and Integration
ControlNet SD 1.5 Segmentation is employed in tasks that demand explicit spatial and semantic control over image generation and manipulation. Its primary application is the guided synthesis of images where users define the layout and identities of objects in a scene via semantic masks. This approach is especially valuable for digital content creation, design prototyping, and research explorations into controllable generative models.
The model accepts segmentation masks produced by a range of automated pre-processors, including systems leveraging Oneformer ADE20K, Oneformer COCO, and Uniformer pipelines, as well as hand-crafted input masks. ControlNet's design supports seamless integration with the broader Stable Diffusion ecosystem and popular user interface extensions, facilitating workflows that combine multiple control modalities.
Family Models and Related Work
ControlNet 1.1 includes a suite of models, each architecturally unified with SD 1.5 Segmentation but specializing in different conditioning modalities. These include models for depth map control, normal map conditioning using Bae's method, edge detection, scribble interpretation, lineart, and more. Experimental variants such as Shuffle, Instruct Pix2Pix, and Tile introduce new paradigms for content reorganization and guided inpainting. This modularity allows users to chain or combine different control signals, subject to interface support, to achieve compounded compositional control.
Limitations and Considerations
Despite its expanded capabilities, some technical limitations are noted in the documentation. Official "multi-ControlNet" use—combining several control signals in parallel—is supported only in specific interface extensions, necessitating bespoke implementation for alternative environments. Certain models in the suite, such as Shuffle and Instruct Pix2Pix, are classified as experimental and may exhibit instability or require further fine-tuning. Additionally, custom integrations must adhere to architectural conventions such as applying global average pooling between encoder outputs and Stable Diffusion’s UNet layers for correct operation. The specific segmentation model for anime lineart further requires external weights not bundled with the main release.