UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/WebDev Arena
WebDev Arena
Live · updated continuously
Benchmarks · /benchmarks/arena-webdev

WebDev Arena

Blind preference arena for web app generation and front-end implementation quality.
Source · Arena
Version · arena snapshot 2026-05-01
Scores · 61

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_webdev_2026_q2
domain
Coding

What it measures vs what it misses

✓ Measures

Perceived usefulness and polish on browser-facing coding tasks. How often a model's generated web experience wins in side-by-side judgments.

✗ Misses

Objective correctness or runtime reliability. Accessibility, maintainability, and deploy-time quality unless voters notice them directly.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · Claude Opus 4.7
AR · May 1, 2026
1,561
#2 · Claude Opus 4.6
AR · May 1, 2026
1,543
#3 · GLM-5.1
AR · May 1, 2026
1,534
#4 · Claude Sonnet 4.6
AR · May 1, 2026
1,527
#5 · Kimi K2.6
AR · May 1, 2026
1,526
#6 · muse-spark
AR · May 1, 2026
1,509
#7 · MiMo-V2.5-Pro
AR · May 1, 2026
1,475
#8 · Claude Opus 4.5
AR · May 1, 2026
1,467
#9 · Qwen3.6 Plus
AR · May 1, 2026
1,467
#10 · deepseek-v4-pro-thinking
AR · May 1, 2026
1,455
#11 · Gemini 3.1 Pro Preview
AR · May 1, 2026
1,453
#12 · GPT-5.5
AR · May 1, 2026
1,447
#13 · mimo-v2.5
AR · May 1, 2026
1,444
#14 · GLM-4.7
AR · May 1, 2026
1,440
#15 · Gemini 3 Pro Preview
AR · May 1, 2026
1,438
#16 · GPT-5.4
AR · May 1, 2026
1,437
#17 · GLM-5
AR · May 1, 2026
1,437
#18 · kimi-k2.5-thinking
AR · May 1, 2026
1,430
#19 · MiMo-V2-Pro
AR · May 1, 2026
1,430
#20 · MiniMax-M2.7
AR · May 1, 2026
1,411
#21 · Grok 4.3
AR · May 1, 2026
1,408
#22 · kimi-k2.5-instant
AR · May 1, 2026
1,408
#23 · GPT-5.3 Codex
AR · May 1, 2026
1,407
#24 · GPT-5.4 mini
AR · May 1, 2026
1,400
#25 · Grok 4.20
AR · May 1, 2026
1,399
#26 · GPT-5
AR · May 1, 2026
1,393
#27 · GPT-5.4 nano
AR · May 1, 2026
1,393
#28 · minimax-m2.1-preview
AR · May 1, 2026
1,392
#29 · Gemini 3 Flash
AR · May 1, 2026
1,389
#30 · Qwen3.5 397B A17B
AR · May 1, 2026
1,387
#31 · Claude Sonnet 4.5
AR · May 1, 2026
1,386
#32 · Claude Opus 4.1
AR · May 1, 2026
1,385
#33 · Claude Opus 4
AR · May 1, 2026
1,385
#34 · MiniMax-M2.5
AR · May 1, 2026
1,383
#35 · deepseek-v3.2-thinking
AR · May 1, 2026
1,368
#36 · Qwen3.5 122B A10B
AR · May 1, 2026
1,363
#37 · GLM-4.6
AR · May 1, 2026
1,355
#38 · Qwen3.5 27B
AR · May 1, 2026
1,350
#39 · GPT-5.2
AR · May 1, 2026
1,335
#40 · DeepSeek Chat
AR · May 1, 2026
1,332
#41 · kimi-k2-thinking-turbo
AR · May 1, 2026
1,330
#42 · Claude Haiku 4.5
AR · May 1, 2026
1,317
#43 · MiniMax-M2
AR · May 1, 2026
1,304
#44 · MiMo-V2-Flash
AR · May 1, 2026
1,300
#45 · DeepSeek V3.2 Exp
AR · May 1, 2026
1,286
#46 · Qwen3-Coder 480B A35B
AR · May 1, 2026
1,281
#47 · KAT-Coder-Pro V1
AR · May 1, 2026
1,258
#48 · Qwen3.5 35B A3B
AR · May 1, 2026
1,248
#49 · Trinity Large Thinking
AR · May 1, 2026
1,246
#50 · GPT-5.1
AR · May 1, 2026
1,239
#51 · Gemini 3.1 Flash-Lite Preview
AR · May 1, 2026
1,238
#52 · Qwen3.5 Flash
AR · May 1, 2026
1,236
#53 · Grok 4.1 Fast
AR · May 1, 2026
1,234
#54 · Mistral Large 3
AR · May 1, 2026
1,222
#55 · Grok 4.1
AR · May 1, 2026
1,207
#56 · Gemini 2.5 Pro
AR · May 1, 2026
1,203
#57 · Devstral 2
AR · May 1, 2026
1,199
#58 · Mercury 2
AR · May 1, 2026
1,165
#59 · Grok 4 Fast
AR · May 1, 2026
1,149
#60 · Grok Code Fast
AR · May 1, 2026
1,139
#61 · devstral-medium-2507
AR · May 1, 2026
1,091