CogVideoX is built on a diffusion transformer architecture, co-developed with Tsinghua University, and is available in 2B and 5B parameter sizes with both text-to-video and image-to-video variants.
Zhipu CogVideoX
by Zhipu AI (Z.ai)
Zhipu AI's open video model family, powering their consumer 'Qingying' (清影) product.
- Open releases: CogVideoX-5B (Aug 2024) and CogVideoX1.5-5B (Nov 2024, 10-second clips)
- CogVideoX-2B is Apache 2.0; CogVideoX-5B uses a separate Tsinghua-authored license permitting commercial use with some restrictions
- Open weights generate up to 10s at 16fps, up to 1360×768
- Consumer product 'Qingying' has reportedly been upgraded beyond the open model to 4K/60fps with audio, per Zhipu's own Chinese-language coverage