Launch a dedicated cloud GPU server running Laboratory OS to download and run Animagine XL using any compatible app or framework.
Direct Download
Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.
Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.
Train your own LoRAs and finetunes for Stable Diffusion and Flux using this popular GUI for the Kohya trainers.
Model Report
CagliostroLab / Animagine XL
Animagine XL is an open-source text-to-image model developed by Cagliostro Research Lab, fine-tuned from Stable Diffusion XL to specialize in anime-style illustration generation. The model employs structured tag-based prompting inspired by Danbooru conventions and incorporates aesthetic evaluation techniques to produce high-resolution character art with improved anatomical accuracy, particularly for hand rendering, addressing common challenges in AI-generated anime artwork.
Explore the Future of AI
Your server, your data, under your control
Animagine XL is an open-source series of anime-themed text-to-image generative models created by Cagliostro Research Lab in collaboration with SeaArt.ai. Fine-tuned from Stable Diffusion XL, Animagine XL specializes in producing high-resolution, detailed anime-style illustrations from descriptive textual prompts. It is designed to enhance the synthesis of character art, improve prompt interpretation, and more faithfully render complex anatomical features such as hands, which present particular challenges for generative AI models.
Introductory showcase video illustrating the output quality and diversity of Animagine XL V3.1. [Source]
Model Features and Prompting Strategy
Animagine XL employs a variety of features and architectural refinements tailored for the anime art domain. The model integrates advanced prompt parsing strategies—using a structured tag ordering inspired by NovelAI tag ordering documentation—to achieve consistent and accurate character synthesis. Recommended prompts typically begin with the number and gender of characters, followed by character name, series, and additional descriptive tags. This method enables more precise interpretation of user intentions and facilitates the rendering of both iconic and original anime characters.
The model supports a diverse set of tags affecting output characteristics, including quality modifiers (such as "masterpiece" or "good quality"), rating tags (for content control, such as "safe" or "sensitive"), and art era modifiers (such as "newest" or "oldest"). From version 3.1 onwards, aesthetic evaluation tags, derived from a dedicated Vision Transformer (ViT) classifier trained on anime art, can guide outputs towards higher visual appeal. Multi-aspect ratio generation is also supported, covering square, portrait, and landscape formats at various resolutions.
Training Data and Technical Foundations
The architecture of Animagine XL is grounded in diffusion-based generative modeling, utilizing the Stable Diffusion XL base and further fine-tuned through proprietary methods. The model employs a specialized VAE, madebyollin/sdxl-vae-fp16-fix, to improve encoding and decoding of high-resolution images.
Training for Animagine XL 3.0 was conducted on approximately 1.2 million images during the initial feature alignment stage, using additional curated image subsets for subsequent refinement and aesthetic tuning, for a total of roughly 2.1 million images across versions 2.0 and 3.0. Training processes incorporated custom scripts adapted from kohya-ss/sd-scripts and leveraged advanced label association techniques to optimize tag learning. Hyperparameters were tuned in multiple training stages, adjusting learning rates and batch sizes to balance stability, convergence, and expressiveness.
Aesthetic evaluation tags were established using the aesthetic-shadow-v2 ViT classifier to score and prioritize visually appealing outputs in the training pipeline, resulting in more refined generations and consistent character appeal.
Applications and Evaluations
Animagine XL primarily caters to anime artists, illustrators, and enthusiasts seeking to create character art, fan art, or original concept pieces from descriptive text. The model demonstrates proficiency in generating recognizable anime characters, often requiring only prompt-based specification rather than supplementary fine-tuning via LoRA techniques. Its improvements in anatomical rendering, especially of hands, address previously noted deficits within AI art generation.
Sample output illustrating improved hand anatomy; prompt: 'smiling girl in school uniform waving'.
Quantitatively, Animagine XL 3.1 has received high ratings on Civitai. Empirical analysis demonstrates that selection of prompt structure and CFG (Classifier-Free Guidance) scale parameter materially affects output clarity and fidelity.
Grid comparison illustrating the effect of different CFG scale values on generation sharpness and coherence. Lower CFG produces blurrier images; higher values yield sharper, more detailed results.
Animagine XL is optimized for anime aesthetics rather than photorealism, and its design makes it less suitable for tasks outside the anime domain. While improvements have been made, occasional anatomical inconsistencies can still arise, particularly with complex hand poses. Character generation is most effective when prompts use Danbooru-style structured tags; natural language prompts may yield less reliable results. The prevalence of high-quality, mature-rated images in the training data can sometimes result in incidental generation of sensitive material unless properly constrained with negative prompts and content-specific tags.
The dataset, while extensive, does not exhaustively cover the full breadth of anime character design, which may limit representation of obscure or newly introduced characters without additional fine-tuning. The training process for Animagine XL 3.0 also encountered challenges in distributed gradient synchronization, resulting in partial updates during multi-GPU training, though these were noted as areas for future optimization.
Sample depicting enhanced hand gesture synthesis, addressing a known challenge in anime-style text-to-image generation.
Animagine XL has seen several developmental milestones. Version 2.0 laid the groundwork for aesthetic optimization, while version 3.0 expanded the training set and introduced advanced tag handling. The latest release, Animagine XL 3.1, further refines model performance with improved aesthetic tagging and enhanced prompt control.
The model and its weights are made available under the Fair AI Public License 1.0-SD (FAIPL-1.0-SD), which defines conditions for its use and modification. This license compels redistribution of source code for network-accessible modifications and mandates release of derivative works under compatible licensing conditions.
Sample Outputs
Generated illustration: Asuka Langley Soryu on an ornate throne. Prompt: 'Asuka Langley Soryu, sitting regally on a throne, dramatic, detailed, masterpiece'.