UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Code Arena
Code Arena
Live · updated continuously
Benchmarks · /benchmarks/arena-code

Code Arena

Blind coding preference arena for code generation and editing quality.
Source · Arena
Version · arena snapshot 2026-05-01
Scores · 61

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_code_2026_q2
domain
Coding

What it measures vs what it misses

✓ Measures

Human preference over coding outputs. Perceived usefulness and style fit in side-by-side code tasks.

✗ Misses

Pass/fail correctness. Latency and cost.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · Claude Opus 4.7
AR · May 1, 2026
1,561
#2 · Claude Opus 4.6
AR · May 1, 2026
1,543
#3 · GLM-5.1
AR · May 1, 2026
1,534
#4 · Claude Sonnet 4.6
AR · May 1, 2026
1,527
#5 · Kimi K2.6
AR · May 1, 2026
1,526
#6 · muse-spark
AR · May 1, 2026
1,509
#7 · MiMo-V2.5-Pro
AR · May 1, 2026
1,475
#8 · Claude Opus 4.5
AR · May 1, 2026
1,467
#9 · Qwen3.6 Plus
AR · May 1, 2026
1,467
#10 · deepseek-v4-pro-thinking
AR · May 1, 2026
1,455
#11 · Gemini 3.1 Pro Preview
AR · May 1, 2026
1,453
#12 · GPT-5.5
AR · May 1, 2026
1,447
#13 · mimo-v2.5
AR · May 1, 2026
1,444
#14 · GLM-4.7
AR · May 1, 2026
1,440
#15 · Gemini 3 Pro Preview
AR · May 1, 2026
1,438
#16 · GPT-5.4
AR · May 1, 2026
1,437
#17 · GLM-5
AR · May 1, 2026
1,437
#18 · kimi-k2.5-thinking
AR · May 1, 2026
1,430
#19 · MiMo-V2-Pro
AR · May 1, 2026
1,430
#20 · MiniMax-M2.7
AR · May 1, 2026
1,411
#21 · Grok 4.3
AR · May 1, 2026
1,408
#22 · kimi-k2.5-instant
AR · May 1, 2026
1,408
#23 · GPT-5.3 Codex
AR · May 1, 2026
1,407
#24 · GPT-5.4 mini
AR · May 1, 2026
1,400
#25 · Grok 4.20
AR · May 1, 2026
1,399
#26 · GPT-5
AR · May 1, 2026
1,393
#27 · GPT-5.4 nano
AR · May 1, 2026
1,393
#28 · minimax-m2.1-preview
AR · May 1, 2026
1,392
#29 · Gemini 3 Flash
AR · May 1, 2026
1,389
#30 · Qwen3.5 397B A17B
AR · May 1, 2026
1,387
#31 · Claude Sonnet 4.5
AR · May 1, 2026
1,386
#32 · Claude Opus 4.1
AR · May 1, 2026
1,385
#33 · Claude Opus 4
AR · May 1, 2026
1,385
#34 · MiniMax-M2.5
AR · May 1, 2026
1,383
#35 · deepseek-v3.2-thinking
AR · May 1, 2026
1,368
#36 · Qwen3.5 122B A10B
AR · May 1, 2026
1,363
#37 · GLM-4.6
AR · May 1, 2026
1,355
#38 · Qwen3.5 27B
AR · May 1, 2026
1,350
#39 · GPT-5.2
AR · May 1, 2026
1,335
#40 · DeepSeek Chat
AR · May 1, 2026
1,332
#41 · kimi-k2-thinking-turbo
AR · May 1, 2026
1,330
#42 · Claude Haiku 4.5
AR · May 1, 2026
1,317
#43 · MiniMax-M2
AR · May 1, 2026
1,304
#44 · MiMo-V2-Flash
AR · May 1, 2026
1,300
#45 · DeepSeek V3.2 Exp
AR · May 1, 2026
1,286
#46 · Qwen3-Coder 480B A35B
AR · May 1, 2026
1,281
#47 · KAT-Coder-Pro V1
AR · May 1, 2026
1,258
#48 · Qwen3.5 35B A3B
AR · May 1, 2026
1,248
#49 · Trinity Large Thinking
AR · May 1, 2026
1,246
#50 · GPT-5.1
AR · May 1, 2026
1,239
#51 · Gemini 3.1 Flash-Lite Preview
AR · May 1, 2026
1,238
#52 · Qwen3.5 Flash
AR · May 1, 2026
1,236
#53 · Grok 4.1 Fast
AR · May 1, 2026
1,234
#54 · Mistral Large 3
AR · May 1, 2026
1,222
#55 · Grok 4.1
AR · May 1, 2026
1,207
#56 · Gemini 2.5 Pro
AR · May 1, 2026
1,203
#57 · Devstral 2
AR · May 1, 2026
1,199
#58 · Mercury 2
AR · May 1, 2026
1,165
#59 · Grok 4 Fast
AR · May 1, 2026
1,149
#60 · Grok Code Fast
AR · May 1, 2026
1,139
#61 · devstral-medium-2507
AR · May 1, 2026
1,091