Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_image_to_video_2026_q2
domain
Video generation
What it measures vs what it misses
✓ Measures
Observed user preference on image-conditioned video generation. How often a model's motion synthesis wins in side-by-side comparisons.
✗ Misses
Objective temporal consistency metrics. Editing precision and downstream production workflow fit.
Why this countsObserved user preference on image-conditioned video generation.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesObjective temporal consistency metrics.