Benchmarks · /benchmarks/livebench-reasoning

Reasoning

Recent reasoning and analysis tasks with verifiable answers.

Source · LiveBench
Version · livebench snapshot 2026-05-01
Scores · 32

Passport

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_reasoning_2026_03

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Objective reasoning correctness.

✗ Misses

User preference and style.

Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.

Leaderboard · this benchmark version

#1 · o1 Preview

LB · Dec 10, 2024

76.5%

#2 · Gemini 2.5 Pro

LB · Apr 1, 2025

71.6%

#3 · o1

LB · Mar 4, 2025

69.4%

#4 · Claude Sonnet 3.7

LB · Apr 1, 2025

67.1%

#5 · GPT-4.5 Preview

LB · Apr 3, 2025

65.3%

#6 · Grok 3

LB · Mar 18, 2025

62.3%

#7 · Gemini Experimental

LB · Feb 6, 2025

62%

#8 · Claude Sonnet 3.5

LB · Feb 6, 2025

61.7%

#9 · GPT-4o

LB · Apr 1, 2025

59.5%

#10 · DeepSeek Reasoner

LB · Apr 3, 2025

59.2%

#11 · Claude Opus 3

LB · Feb 6, 2025

57.3%

#12 · Gemini 2.0 Pro Experimental

LB · Apr 1, 2025

57.1%

#13 · o3 mini

LB · Feb 6, 2025

56.7%

#14 · o1 mini

LB · Feb 6, 2025

56.1%

#15 · Gemini 1.5 Pro

LB · Feb 6, 2025

55.5%

#16 · GPT-4

LB · Dec 10, 2024

55.1%

#17 · GPT-4 Turbo

LB · Dec 10, 2024

54.5%

#18 · Grok 2

LB · Feb 6, 2025

53.8%

#19 · Gemini 2.0 Flash

LB · Apr 7, 2025

51.7%

#20 · Grok Beta

LB · Feb 6, 2025

51.2%

#21 · Claude Haiku 3.5

LB · Feb 6, 2025

51%

#22 · Claude Haiku 4.5

LB · Feb 6, 2025

51%

#23 · Grok 2 mini

LB · Dec 10, 2024

49.8%

#24 · Gemini 1.5 Flash

LB · Dec 12, 2024

48.4%

#25 · GPT-4o mini

LB · Feb 6, 2025

47.4%

#26 · DeepSeek Chat

LB · Dec 11, 2024

46.4%

#27 · Gemini 2.0 Flash-Lite

LB · Apr 2, 2025

46.2%

#28 · Claude Sonnet 3

LB · Dec 10, 2024

44.5%

#29 · Grok 3 mini

LB · Mar 14, 2025

43.9%

#30 · Gemini 1.5 Flash 8B

LB · Apr 3, 2025

43.6%

#31 · Claude Haiku 3

LB · Dec 12, 2024

40.6%

#32 · GPT-3.5 Turbo

LB · Dec 10, 2024

35.5%