source
Arena
Observed user preference over generated images from the same text prompt. Relative appeal and usefulness in head-to-head image generation comparisons.
Objective faithfulness metrics. Detailed category-specific behavior such as typography, anatomy, or commercial design accuracy.