UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Sources/Arena
Arena
Live · updated continuously
Browse sectionsArena
AR · benchmark platform

Arena

Blind human-preference arenas across chat, coding, vision, image, video, document, and search.
verification status
verified
Last checked May 1, 2026

Evidence ledger

Modalitiestext, code, vision, document, image, video, searchCadencecontinuousAPInot publicEvaluations773VerificationverifiedVerified runtime764Manual verified0Relay / mirrored0Backfilled9

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot
Latest pull

May 1, 2026

json
parser
Parsed 764 Arena leaderboard records.

0.4.0

ok
verify
arena verification finished with status verified.

May 1, 2026

verified
open
Arena exposed leaderboard rows that are not yet mapped into the canonical registry: muse-spark (4), kimi-k2.6 (4), kimi-k2.5-thinking (4), kimi-k2.5-instant (4), mimo-v2.5 (4), glm-5.1 (3), mimo-v2.5-pro (3), deepseek-v4-pro-thinking (3), glm-5 (3), qwen3.6-plus (3), glm-4.6 (3), glm-4.7 (3), mimo-v2-pro (3), deepseek-v3.2-thinking (3), kimi-k2-thinking-turbo (3)

May 1, 2026

model_alias

Benchmarks from this source

Text Arena
Blind chat preference
Arena rating
Code Arena
Blind coding preference
Arena rating
Vision Arena
Blind multimodal preference
Arena rating
WebDev Arena
Blind web app preference
Arena rating
Search Arena
Blind search-grounded preference
Arena rating
Document Arena
Blind document preference
Arena rating
Text-to-Image Arena
Blind image generation preference
Arena rating
Image Edit Arena
Blind image editing preference
Arena rating
Text-to-Video Arena
Blind video generation preference
Arena rating
Image-to-Video Arena
Blind image-to-video preference
Arena rating
Video Edit Arena
Blind video editing preference
Arena rating

Latest change explanation

arena changed versus arena-20260501T202609Z with source_snapshot, parser_diff, mapping, benchmark_movement causes.

  • Source snapshot changed: The saved raw source snapshot changed relative to the previous run.
  • Parser output changed: The parser metadata or warnings shifted relative to the previous run.
  • Mapping or review queue changed: Mapping-related signals changed across 1 fields.
  • Benchmark coverage or values moved: 8 benchmark rows were added, 0 removed, and 118 existing rows changed value or evaluation date.