Text-to-video

Zhipu CogVideoX

by Zhipu AI (Z.ai)

Zhipu AI's open video model family, powering their consumer 'Qingying' (清影) product.

Open releases: CogVideoX-5B (Aug 2024) and CogVideoX1.5-5B (Nov 2024, 10-second clips)
CogVideoX-2B is Apache 2.0; CogVideoX-5B uses a separate Tsinghua-authored license permitting commercial use with some restrictions
Open weights generate up to 10s at 16fps, up to 1360×768
Consumer product 'Qingying' has reportedly been upgraded beyond the open model to 4K/60fps with audio, per Zhipu's own Chinese-language coverage

CogVideoX is built on a diffusion transformer architecture, co-developed with Tsinghua University, and is available in 2B and 5B parameter sizes with both text-to-video and image-to-video variants.