Text-to-video

Vidu

by Shengshu Technology (生数科技)

Chinese video model with text, image, and multi-reference generation, rooted in Tsinghua research.

Current flagship Vidu Q3 (announced Jan 2026): long-form, native combined audio+video generation
'Reference-to-Video' combines multiple reference images of characters/objects/scenes for consistency
Reference-to-Video launched globally April 13, 2026
Up to 1080p output; free-credit tier plus paid Creator Plan and enterprise API

Vidu was first announced in April 2024 and has iterated quickly since. Its Reference-to-Video feature is aimed squarely at the consistent-character problem that trips up many single-shot generation models.