Image-to-video

Stable Video Diffusion

by Stability AI

Open-weight model that animates a single still image into a short clip — no text prompts.

Image-conditioning only — cannot be controlled via text prompts
Base model generates 14 frames; SVD-XT generates 25 frames, at 576×1024, 4 seconds or less
Open weights under a non-commercial/research license; commercial use requires a separate Stability AI license
Commonly run via Hugging Face, ComfyUI, or NVIDIA NIM

Stable Video Diffusion is a narrower tool than most entries here: it only animates an existing still image, rather than generating from a text description. That makes it a common building block inside other pipelines rather than a standalone product.