Loading...
Browse Models
The CogVideoX family consists of open-weight diffusion transformer models developed by THUDM and Zhipu AI for text-to-video and image-to-video generation. The models utilize a 3D causal variational autoencoder and 3D full attention mechanisms to generate coherent videos up to 10 seconds long at resolutions up to 1360×768, trained on 35 million video clips and 2 billion images.