UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Coding generation
Coding generation
Live · updated continuously
Benchmarks · /benchmarks/livebench-coding-generation

Coding generation

LiveBench code-generation slice derived from the official `LCB_generation` tasks.
Source · LiveBench
Version · livebench snapshot 2026-05-01
Scores · 32

Passport

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
LiveBench
metric
Score (%)
judge
Objective
direction
higher better
group id
livebench_coding_generation_2026_04
domain
Coding

What it measures vs what it misses

✓ Measures

Objective code synthesis quality on recent LiveBench tasks.

✗ Misses

Editing ergonomics. Latency and price.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · Gemini 2.5 Pro
LB · Mar 25, 2025
89.7%
#2 · DeepSeek Reasoner
LB · Feb 6, 2025
80.8%
#3 · o3 mini
LB · Feb 6, 2025
79.9%
#4 · GPT-4.5 Preview
LB · Feb 27, 2025
74.4%
#5 · Grok 3 mini
LB · Mar 14, 2025
74.4%
#6 · Claude Sonnet 3.7
LB · Feb 25, 2025
68.4%
#7 · o1 mini
LB · Feb 6, 2025
64.1%
#8 · o1
LB · Mar 4, 2025
61.2%
#9 · Claude Sonnet 3.5
LB · Feb 6, 2025
59%
#10 · Gemini 2.0 Pro Experimental
LB · Feb 6, 2025
59%
#11 · o1 Preview
LB · Feb 6, 2025
57.7%
#12 · Gemini 2.0 Flash
LB · Feb 6, 2025
55.5%
#13 · Grok 3
LB · Mar 14, 2025
54.5%
#14 · Gemini Experimental
LB · Feb 6, 2025
53.4%
#15 · GPT-4o
LB · Mar 27, 2025
51.9%
#16 · Claude Haiku 3.5
LB · Feb 6, 2025
48.7%
#17 · Claude Haiku 4.5
LB · Feb 6, 2025
48.7%
#18 · GPT-4 Turbo
LB · Feb 6, 2025
45.3%
#19 · Gemini 2.0 Flash-Lite
LB · Feb 27, 2025
44.9%
#20 · DeepSeek Chat
LB · Dec 11, 2024
42.3%
#21 · GPT-4o mini
LB · Dec 10, 2024
42.3%
#22 · Grok 2
LB · Feb 6, 2025
42.3%
#23 · Grok Beta
LB · Feb 6, 2025
42.3%
#24 · Grok 2 mini
LB · Feb 6, 2025
41%
#25 · Gemini 1.5 Pro
LB · Feb 6, 2025
38.5%
#26 · Claude Opus 3
LB · Feb 6, 2025
37.2%
#27 · Gemini 1.5 Flash
LB · Feb 6, 2025
37.2%
#28 · GPT-4
LB · Dec 10, 2024
34.6%
#29 · Gemini 1.5 Flash 8B
LB · Feb 6, 2025
31.4%
#30 · Claude Sonnet 3
LB · Feb 6, 2025
30.8%
#31 · GPT-3.5 Turbo
LB · Feb 6, 2025
29.5%
#32 · Claude Haiku 3
LB · Feb 6, 2025
26.9%