Browse Models
The simplest way to self-host ControlNet SDXL IP Adapter. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
ControlNet SDXL IP Adapter combines image and text guidance for controlled image generation. It uses OpenCLIP encoders and supports multiple control methods (edges, depth, poses, etc.) while maintaining efficiency with 22M parameters. Can work alongside other ControlNet models for complex image manipulations.
The ControlNet SDXL IP Adapter represents a significant advancement in image generation control, combining the capabilities of ControlNet architecture with IP-Adapter technology developed by Tencent AI Lab. This model enables precise control over image generation through reference images while maintaining compatibility with Stable Diffusion XL's powerful base capabilities.
The model builds upon the ControlNet 1.1 framework, which maintains the same core architecture as ControlNet 1.0 for long-term stability. It integrates IP-Adapter technology, which adds only 22M parameters while enabling image prompt capabilities. The system processes input images through pre-trained annotators (such as HED and OpenPose) to generate guidance signals for the diffusion process, as detailed in the original ControlNet research.
The architecture leverages either OpenCLIP-ViT-H-14 or OpenCLIP-ViT-bigG-14 as image encoders, depending on the specific variant. These encoders generate image embeddings that serve as conditions for the diffusion model. The system supports both global image embeddings and patch image embeddings, with the latter providing finer control and closer adherence to reference images.
ControlNet SDXL IP Adapter offers several key capabilities:
The model supports multimodal image generation, allowing users to combine both image and text prompts effectively. It can be used alongside other models from Stability AI (Control-LoRAs) and Diffusers for enhanced control over the generation process.
For optimal performance, the model requires specific VRAM configurations:
--medvram-sdxl
for systems with 8GB-16GB VRAM--lowvram
for systems with less than 8GB VRAMThe model files should be placed in either:
stable-diffusion-webui\extensions\sd-webui-controlnet\models
stable-diffusion-webui\models\ControlNet
When using the model, recommended parameters include:
scale=1.0
for image-only promptsscale=0.5
for multimodal prompts (combining image and text)The model is released under the Apache-2.0 license, making it accessible for both research and practical applications.