Benchmarks · /benchmarks/livebench-coding-generation

Coding generation

LiveBench code-generation slice derived from the official `LCB_generation` tasks.

Source · LiveBench
Version · livebench snapshot 2026-05-01
Scores · 32

Passport

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_coding_generation_2026_04

domain

Coding

What it measures vs what it misses

✓ Measures

Objective code synthesis quality on recent LiveBench tasks.

✗ Misses

Editing ergonomics. Latency and price.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · Gemini 2.5 Pro

LB · Mar 25, 2025

89.7%

#2 · DeepSeek Reasoner

LB · Feb 6, 2025

80.8%

#3 · o3 mini

LB · Feb 6, 2025

79.9%

#4 · GPT-4.5 Preview

LB · Feb 27, 2025

74.4%

#5 · Grok 3 mini

LB · Mar 14, 2025

74.4%

#6 · Claude Sonnet 3.7

LB · Feb 25, 2025

68.4%

#7 · o1 mini

LB · Feb 6, 2025

64.1%

#8 · o1

LB · Mar 4, 2025

61.2%

#9 · Claude Sonnet 3.5

LB · Feb 6, 2025

59%

#10 · Gemini 2.0 Pro Experimental

LB · Feb 6, 2025

59%

#11 · o1 Preview

LB · Feb 6, 2025

57.7%

#12 · Gemini 2.0 Flash

LB · Feb 6, 2025

55.5%

#13 · Grok 3

LB · Mar 14, 2025

54.5%

#14 · Gemini Experimental

LB · Feb 6, 2025

53.4%

#15 · GPT-4o

LB · Mar 27, 2025

51.9%

#16 · Claude Haiku 3.5

LB · Feb 6, 2025

48.7%

#17 · Claude Haiku 4.5

LB · Feb 6, 2025

48.7%

#18 · GPT-4 Turbo

LB · Feb 6, 2025

45.3%

#19 · Gemini 2.0 Flash-Lite

LB · Feb 27, 2025

44.9%

#20 · DeepSeek Chat

LB · Dec 11, 2024

42.3%

#21 · GPT-4o mini

LB · Dec 10, 2024

42.3%

#22 · Grok 2

LB · Feb 6, 2025

42.3%

#23 · Grok Beta

LB · Feb 6, 2025

42.3%

#24 · Grok 2 mini

LB · Feb 6, 2025

41%

#25 · Gemini 1.5 Pro

LB · Feb 6, 2025

38.5%

#26 · Claude Opus 3

LB · Feb 6, 2025

37.2%

#27 · Gemini 1.5 Flash

LB · Feb 6, 2025

37.2%

#28 · GPT-4

LB · Dec 10, 2024

34.6%

#29 · Gemini 1.5 Flash 8B

LB · Feb 6, 2025

31.4%

#30 · Claude Sonnet 3

LB · Feb 6, 2025

30.8%

#31 · GPT-3.5 Turbo

LB · Feb 6, 2025

29.5%

#32 · Claude Haiku 3

LB · Feb 6, 2025

26.9%