Browse Models
The simplest way to self-host SDXL Lightning. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
SDXL Lightning is a distilled version of Stable Diffusion XL that generates images in 1-8 steps while maintaining output quality. It uses progressive adversarial distillation with SDXL's UNet encoder as a discriminator, enabling efficient latent space processing across all timesteps.
SDXL-Lightning represents a significant advancement in text-to-image generation, building upon the foundation of Stable Diffusion XL (SDXL). Detailed in the research paper, this model introduces a novel progressive adversarial diffusion distillation method that effectively balances image quality and mode coverage.
The model's architecture employs a unique combination of progressive and adversarial distillation techniques. Unlike previous approaches that rely on MSE loss—which often results in blurry outputs during few-step generation—SDXL-Lightning implements adversarial loss at each distillation stage. This ensures the distilled model maintains both the probability flow and mode coverage of the original SDXL model.
A key architectural innovation is the utilization of the pre-trained SDXL UNet encoder as the discriminator backbone. This design choice enables efficient distillation in latent space and supports discrimination across all timesteps, resulting in improved computational efficiency and generalizability compared to methods using conventional off-the-shelf encoders.
The training process for SDXL-Lightning utilized a carefully curated subset of the LAION and COYO datasets, specifically selecting high-resolution images (>1024px) with high aesthetic scores and sharpness. The distillation process progresses through multiple stages, starting from 128 steps and gradually reducing to a single step.
Several advanced training techniques were employed to optimize performance and manage memory usage:
The model also incorporates various stabilization techniques, including training the student network and discriminator at multiple timesteps and switching to x0 prediction for one-step generation.
SDXL-Lightning is available in multiple variants, offering both full UNet and LoRA (Low-Rank Adaptation) implementations. The model provides checkpoints for:
The 2-step, 4-step, and 8-step variants demonstrate superior image quality, while the 1-step model is considered experimental and less stable. Full UNet models provide the highest image quality, though LoRA variants offer convenient plug-and-play functionality and compatibility with various base models and plugins like ControlNet.
Performance benchmarks show SDXL-Lightning significantly outperforming previous open-source distillation models such as SDXL-Turbo and LCM in terms of image quality and detail preservation. This is demonstrated through both qualitative assessments and quantitative metrics using FID (whole image and patch-based) and CLIP scores.
The model is released under an openrail++ license and is available through the Hugging Face repository. One notable limitation is the requirement for separate checkpoints for each inference step setting, though this is partially mitigated by the LoRA implementations. The architecture may not be optimal for one-step generation, suggesting potential for future improvements.