Browse Models

fal /

AuraFlow v0.3

Family

AuraFlow

Type

Foundation Model

License

Apache-2.0 License

Released

2024-08-14

How To Use

Laboratory OS

Launch a dedicated cloud GPU server running Laboratory OS to download and run AuraFlow v0.3 using any compatible app or framework.

Direct Download

Must be used with a compatible app, notebook, or codebase. May run slowly, or not work at all, depending on local system resources, particularly GPU(s) and available VRAM.

Browse Compatible Apps

comfyanonymous /

ComfyUI

Generate images and videos using a powerful low-level workflow graph builder - the fastest, most flexible, and most advanced visual generation UI.

lllyasviel /

Stable Diffusion WebUI Forge

Forge is a platform built on top of Stable Diffusion WebUI to make development easier, optimize resource management, speed up inference, and study experimental features.

Model Report

fal / AuraFlow v0.3

AuraFlow v0.3 is a 6.8 billion parameter, flow-based text-to-image generative model developed by fal.ai. Built on an optimized DiT architecture with Maximal Update Parametrization, it features enhanced prompt following capabilities through comprehensive recaptioning and prompt enhancement pipelines. The model supports multiple aspect ratios and achieved a GenEval score of 0.703, demonstrating effective text-to-image synthesis across diverse artistic styles and photorealistic outputs.

Explore the Future of AI

Your server, your data, under your control

AuraFlow v0.3 is a fully open-source, flow-based text-to-image generative model developed by fal.ai. It is the latest iteration in the AuraFlow series, building upon prior versions including AuraFlow v0.2 and v0.1. It incorporates architectural and prompt fidelity refinements compared to previous versions, for generating images from textual descriptions. This model is characterized by its support for a range of image aspect ratios, sophisticated prompt following, and scalability optimizations, all released under an open license to foster transparency and research collaboration.

Collage of AuraFlow v0.3 generative outputs

Technical Architecture

AuraFlow v0.3 is built on a flow-based generative modeling approach, distinguishing itself from diffusion-based models commonly seen in the field. The developmental trajectory began with efforts to reimplement the MMDiT architecture, but empirical experiments led to an optimized design utilizing larger DiT encoder blocks. This improved both computational scalability and efficiency, yielding a 15% increase in model flops utilization at the 6.8 billion parameter scale.

A major design focus was maximizing Model Flops Utilization (MFU). The use of torch.compile during training further optimized layer computation, contributing an additional 10–15% improvement in MFU depending on the training stage. The model employs Maximal Update Parametrization (muP) for predictable, scalable learning rate transfer across model sizes, demonstrating more consistency compared to standard parameterization approaches.

Double layer and learning rate optimization chart

Standard parameterization learning rate graph

Maximal Update Parametrization learning rate transfer

With a parameter count of 6.8 billion, AuraFlow v0.3 leverages a “fatter” design utilizing matrix multiplications divisible by 256. The team investigated optimal aspect ratios, ultimately adopting a 3072:36 hidden dimension to block ratio for this release, an approach inspired by scaling law research in generative modeling.

Scatter plot showing parameters vs. training loss

Training Data and Recaptioning Methods

The AuraFlow v0.3 model was trained on a curated set of aesthetic datasets, further refined from those used in previous iterations. A characteristic of the training process is an extensive recaptioning pipeline: all training image captions were generated using either an in-house captioner or from reliably captioned external data, explicitly avoiding alt-texts. This methodology mirrors the approach seen in the DALL·E 3 workflow, aiming to maximize caption quality for improved prompt alignment.

To enhance the model’s ability to understand and follow intricate instructions, AuraFlow integrates a prompt enhancement pipeline. This system analyzes and expands input prompts into detailed captions, articulating object positions, colors, counts, and spatial relationships while preserving the original intent. The model translates complex textual cues into images with detail and accuracy.

Prompt following example with woman, piano, and boxes

The training process involved advanced distributed techniques, utilizing large-scale GPU fleets and distributed storage systems such as JuiceFS, allowing efficient streaming and staging of massive data volumes. The model underwent multi-stage pretraining—first on resolutions of 256x256, then 512x512 and 1024x1024, before final fine-tuning on varied aspect ratios. Overall, the model was trained from scratch over four weeks of compute, leveraging increased resources compared to earlier releases.

Performance Evaluation and Output Quality

AuraFlow v0.3's performance has been evaluated on standard benchmarks. In GenEval scores, the model achieved scores ranging from 0.63–0.67 during pretraining phases and reached 0.703 in the final release using the enhanced prompt pipeline, indicating prompt alignment and output quality in text-to-image generation. These metrics are consistent with the model's focus on prompt accuracy.

Example outputs include both fidelity to prompt and a variety of artistic styles, such as stylized illustrations, photorealistic scenes, and nuanced artistic compositions generated in response to detailed prompts.

Bicolored cat with martini, Art Nouveau style

For side-by-side visual benchmarking, users can compare AuraFlow generations with those from other models in an online comparison gallery.

Applications, Limitations, and Development

AuraFlow v0.3 is suitable primarily for text-to-image synthesis, serving research, artistic, and content generation applications. The open-source release encourages community experimentation, including finetuning, adaptation for workflows such as ComfyUI, and further model quantization or extension. The development team maintains an active stance on community feedback and iterative improvement.

Currently designated as a beta release, the model’s continued development aims to address limitations associated with compute-intensive training requirements and to improve usability for those with less powerful hardware through potential smaller model variants or mixture-of-expert architectures. The maintainers plan to supplement the release with more detailed technical documentation as the project evolves.

Model Family and Release History

AuraFlow v0.3 succeeds earlier models in the series, with each iteration incorporating architectural and data methodology refinements. The release timeline is as follows:

AuraFlow v0.1: The initial open-source release, establishing the foundation for subsequent work.
AuraFlow v0.2: An intermediary version with incremental improvements.
AuraFlow v0.3: Current release, offering greater fidelity, optimized architecture, and prompt-following enhancements.

Development has been driven by key contributors including @cloneofsimo and @isidentical, with support from the ComfyUI and HuggingFace communities. Related projects, such as Lavenderflow-v0, document earlier architectural experimentation in the context of generative transformer research.

Helpful Resources

fal/AuraFlow-v0.3 (Hugging Face) – Official model release and download
fal/AuraFlow-v0.2 (Hugging Face) – Previous model version
fal.ai Blog Post on AuraFlow – In-depth technical report and insights
AuraFlow Model Comparison Gallery – Visual comparison with other models
fal's Model Gallery & Playground – Interactive prompt testing
ComfyUI GitHub Repository – Integration resources for ComfyUI workflows
JuiceFS GitHub Repository – Distributed file system used for training infrastructure
Scaling Laws for Autoregressive Generative Modeling (arXiv) – Cited foundational research
Lavenderflow-v0 (Hugging Face) – Early architectural explorations
fal.ai Discord – Community discussion and support

For the latest updates, technical discussions, and development insights, the team also recommends following @cloneofsimo and @isidentical on Twitter.