Browse Models
The simplest way to self-host ControlNet 1.5 IP Adapter. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
ControlNet 1.5 IP Adapter combines ControlNet's image conditioning with IP-Adapter technology to enable dual guidance through both reference images and text prompts. It uses OpenCLIP-ViT encoders and features smart resampling for maintaining control image quality across resolutions, with variants optimized for general use and face-specific applications.
The ControlNet 1.5 IP Adapter represents a significant evolution in image-guided generation technology, combining the architectural stability of the ControlNet family with innovative image prompt adaptation capabilities. This model builds upon the foundational ControlNet architecture while incorporating advances from the IP-Adapter research.
The model maintains the core ControlNet architecture while integrating a lightweight image prompt adapter that adds only 22M parameters. It leverages pre-trained image encoders, specifically OpenCLIP-ViT-H-14 (632.08M parameters) or OpenCLIP-ViT-bigG-14 (1844.9M parameters), to process image inputs. The architecture employs a two-stage training strategy: initial pre-training at 512x512 resolution followed by multi-scale fine-tuning, which proves more efficient than direct 1024x1024 training.
The model is available in two primary variants: ip-adapter_sd15.pth
and ip-adapter_sd15_plus.pth
, each optimized for Stable Diffusion 1.5. The Plus variant requires specific configuration, notably a CFG scale of approximately 2 for optimal results. Both versions can be installed in either the stable-diffusion-webui\extensions\sd-webui-controlnet\models
or stable-diffusion-webui\models\ControlNet
directories.
A key strength of the ControlNet 1.5 IP Adapter is its ability to process various types of image inputs while maintaining high-quality output generation. The model implements smart resampling algorithms to ensure pixel-perfect control images regardless of resolution, making it particularly effective for manually created control images.
The model integrates seamlessly with existing controllable generation tools and demonstrates remarkable adaptability to various models fine-tuned from the same base model. While it performs best with square images due to CLIP's central cropping behavior, it can accommodate non-square images through resizing to 224x224.
The model has been tested across different hardware configurations, showing compatibility with both 8GB and 6GB VRAM GPUs. For systems with 6GB VRAM, float16 support is required. Users with 8GB-16GB VRAM should implement the --medvram-sdxl
command-line flag, while those with less than 8GB should use the --lowvram
option.
The implementation demonstrates significant efficiency improvements over previous versions, particularly in the context of the broader ControlNet family. The model is released under the Apache-2.0 license, enabling broad accessibility for both research and application development.