Launch a dedicated cloud GPU server running Laboratory OS to download and run HiDream I1 Full using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.
Model Report
HiDream-ai / HiDream I1 Full
HiDream I1 Full is an open-source image generation model developed by HiDream.ai featuring a 17 billion parameter sparse Diffusion Transformer architecture with Mixture-of-Experts design. The model employs hybrid text encoding combining Long-CLIP, T5-XXL, and Llama 3.1 8B components for precise text-to-image synthesis. It demonstrates strong performance on industry benchmarks and supports diverse visual styles through flow-matching in latent space.
Explore the Future of AI
Your server, your data, under your control
HiDream-I1 Full is an open-source image generative foundation model developed by HiDream.ai, notable for its 17 billion parameter architecture and optimized for high-quality, efficient image synthesis. The system incorporates transformer-based techniques and hybrid text encoding, supporting diverse generative output across multiple visual styles and adhering to rigorous prompt fidelity. HiDream-I1 Full serves as the core model in a broader family of visual intelligence and generative tools, facilitating applications in creative content production and interactive image editing.
A diverse grid of output images generated by HiDream-I1 Full, illustrating photorealism, stylized portraiture, cartoon anthropomorphism, text depiction, and fantasy character design. The central graffiti artwork highlights the model's branding.
HiDream-I1 Full is built on a novel sparse Diffusion Transformer (DiT) architecture, enabling both computational efficiency and scalability for high-fidelity image generation. Utilizing a dual-stream, decoupled encoder at the input stage, image and text tokens are processed independently via a dynamic Mixture-of-Experts (MoE) arrangement. Following this, a single-stream, sparse DiT layer equipped with MoE fosters efficient multimodal interaction.
The MoE incorporates lightweight gating networks to dynamically assign incoming tokens to a subset of specialized feed-forward network experts, guided by a routing mechanism and referencing a shared global expert. Activation within these expert modules employs SwiGLU for improved learning capacity. Each transformer block conditions on pooled Long-CLIP text-image features and sinusoidal timestep embeddings, introduced via adaptive layer normalization for stable multimodal integration. Self-attention blocks apply QK-normalization to enhance training robustness and convergence, as described in the technical report.
A hybrid text encoding approach combines four streams—Long-Context CLIP (CLIP-L/14, CLIP-G/14) for context-rich grounding, a T5-XXL encoder for syntactic comprehension, and intermediate layers from Llama 3.1 8B Instruct for semantic understanding—yielding precise and flexible text-to-image synthesis capabilities. At the generative core, HiDream-I1 operates in latent space using flow-matching, mapping Gaussian noise distributions to target image representations via a pre-trained FLUX.1 VAE module.
An AI-generated digital painting output from HiDream-I1 Full, demonstrating its fine art portraiture capabilities. Prompt information not provided.
Training regimes employ multi-stage strategies. Initial pre-training leverages progressive latent flow matching at resolutions scaling from 256×256 to 1024×1024, following optimization with AdamW, mixed-precision training, and gradient checkpointing within a Fully Sharded Data Parallel (FSDP) framework. For post-hoc alignment, the model is further fine-tuned on curated image-text pairs to enhance prompt alignment and aesthetic quality.
Dataset curation is rigorous. Broad-scope data from web sources and licensed internal collections are filtered by automatic deduplication—using SSCD feature extraction and Faiss-based intra-cluster searches—removing approximately 20% of redundant images. Filtering stages apply content safety, aesthetic prediction, watermark detection, and technical quality controls before annotation with MiniCPM-V 2.6 Vision-Language Model, as described in the project's methodology.
Performance and Evaluation
HiDream-I1 Full performs well on a variety of industry-standard benchmarks. On the HPS v2.1 human preference metric, the model achieves an average score of 33.82, compared to peer models such as Stable Diffusion 2 (26.38), SDXL (30.64), DALL-E 3 (31.44), and others, with strong results across multiple artistic categories including animation, concept art, painting, and photographic realism.
For prompt fidelity and semantic understanding, HiDream-I1 Full attains a DPG-Bench prompt-following score of 85.89, showing capability in relational understanding and instruction adherence. On the GenEval benchmark, the model scores an overall 0.83, demonstrating particular strength in single-object, multi-object, color, counting, and attribution tasks. These metrics provide objective measurements of the model's capabilities (source).
A photorealistic, AI-generated portrait by HiDream-I1 Full, demonstrating high-fidelity generation from textual prompts.
To accelerate inference and broaden applicability, HiDream-I1 Full utilizes GAN-powered Diffusion Model Distillation to train derived variants (HiDream-I1-Dev and HiDream-I1-Fast). These distilled models maintain perceptual quality while enabling reduced sampling steps—balancing speed and resource usage.
Illustrative graphic conveying HiDream-I1 Full's emphasis on fast, efficient image processing and rapid visual content generation.
HiDream-I1 Full forms the foundation for a broader ecosystem of generative and editable visual models:
HiDream-I1-Full is the primary model, optimized for maximal image quality and typically employing over 50 diffusion steps per sample.
HiDream-I1-Dev is a guidance-distilled version operating with 28 diffusion steps, targeting an optimal balance between output quality and generation speed.
HiDream-I1-Fast reduces diffusion steps to 14, enabling near real-time synthesis with minimal compromise in perceptual quality—integral to interactive or resource-constrained deployments.
HiDream-E1 expands these capabilities to instruction-based image editing, enabling fine-grained modifications via natural language, and demonstrates strong performance on EmuEdit and ReasonEdit benchmarks.
HiDream-A1 incorporates generative, editing, and multimodal understanding in a unified conversational interface.
Panel showing HiDream-I1 Full's versatility across cartoon and realistic portrait styles, visually illustrating its multi-style generation capacity.
The versatility supports diverse visual output—including photorealistic, artistic, cartoon, and textual images—with robust adherence to user text prompts, as reflected in benchmark testing.
Applications and Limitations
HiDream-I1 Full is primarily employed for advanced text-to-image generation, lending itself to creative industries, personal projects, scientific visualization, and interactive art platforms. Its integration with editable and agent-based systems underpins workflows that require both novel content generation and iterative refinement through multimodal or conversational interfaces.
Despite optimizations, the full-scale model typically requires around 50 sampling steps, which may limit its use in low-latency or real-time environments. However, distilled variants address this concern, trading off some generative complexity for substantial speed improvements.
All output from HiDream-I1 Full is subject to licensing constraints; users retain rights to generated content for research, personal, or commercial use, provided all model and component licenses—such as the MIT, Apache 2.0, and Llama 3.1 Community Agreements—are respected. Content creation is subject to legal, ethical, and safety guidelines as established in the project documentation.
Release and Development Timeline
HiDream-I1 Full, along with associated models, was publicly released in April 2025, with a comprehensive technical report published in May 2025 outlining its architecture, curation processes, and benchmarking methodologies. Ongoing evolution of the HiDream model family continues to prioritize scientific transparency, open development, and integration into accessible diffusion frameworks.