Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Artificial Analysis
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
aa_text_to_image_current
domain
Image generation
What it measures vs what it misses
✓ Measures
Observed user preference over image generations from the same prompt. Relative appeal and usefulness in blind image comparisons.
✗ Misses
Objective prompt-faithfulness metrics. Latency and cost.
Why this countsIt would matter for text-to-image quality once verified public receipts exist in the catalog.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesThis slice is currently limited because the product does not yet carry first-class image-generation receipts.