The simplest way to self-host Stable Diffusion 1.1. Launch a dedicated cloud GPU server running Lab Station OS to download and serve the model using any compatible app or framework.
Direct Download
Download model weights for local inference. Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on your system resources, particularly GPU(s) and available VRAM.
Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.
Train your own LoRAs and finetunes for Stable Diffusion and Flux using this popular GUI for the Kohya trainers.
Model Report
stabilityai / Stable Diffusion 1.1
Stable Diffusion 1.1 is a text-to-image model using a latent diffusion architecture. It combines an 860M parameter U-Net with a CLIP text encoder, trained on LAION datasets. Notable for efficient image generation through latent space compression, it supports text-to-image, image editing, and inpainting functions.
Explore the Future of AI
Your server, your data, under your control
Stable Diffusion 1.1 is a latent text-to-image diffusion model that generates photorealistic images from text input. The model utilizes a Latent Diffusion Model (LDM) architecture, which operates in the compressed latent space of a pretrained autoencoder rather than directly in pixel space. This innovative approach significantly reduces computational requirements while maintaining high visual fidelity.
The model architecture consists of several key components:
An 860M parameter U-Net for denoising the latent representation
A 123M parameter CLIP ViT-L/14 text encoder for processing text prompts
A variational autoencoder (VAE) for compressing images into latent space
Cross-attention layers enabling flexible conditioning on various inputs
194,000 steps at 512x512 resolution on the LAION-high-resolution dataset (170M examples with resolution ≥ 1024x1024)
Performance benchmarks show improvements across successive versions (v1.1-v1.4) in terms of FID and CLIP scores. The estimated carbon emissions for training Stable Diffusion v1.1 were approximately 11,250 kg CO2 eq, as calculated using the Machine Learning Impact calculator.
guidance_scale: typically 7.5 (adjustable based on application)
Resolution: 512x512 (native training resolution)
Scheduler options: PNDM, K-LMS, or others
Precision: float16 available for lower GPU memory requirements
Limitations and License
The model has several known limitations:
Imperfect photorealism
Inability to render legible text
Difficulty with compositional tasks
Potential issues with face and human generation
Bias towards English and Western cultures
The model is released under the CreativeML OpenRAIL-M license, which permits commercial redistribution but restricts the generation and sharing of illegal or harmful content.