UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Reasoning
Reasoning
Live · updated continuously
Benchmarks · /benchmarks/livebench-reasoning

Reasoning

Recent reasoning and analysis tasks with verifiable answers.
Source · LiveBench
Version · livebench snapshot 2026-05-01
Scores · 32

Passport

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
LiveBench
metric
Score (%)
judge
Objective
direction
higher better
group id
livebench_reasoning_2026_03
domain
Reasoning / math / science

What it measures vs what it misses

✓ Measures

Objective reasoning correctness.

✗ Misses

User preference and style.

Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.

Leaderboard · this benchmark version

#1 · o1 Preview
LB · Dec 10, 2024
76.5%
#2 · Gemini 2.5 Pro
LB · Apr 1, 2025
71.6%
#3 · o1
LB · Mar 4, 2025
69.4%
#4 · Claude Sonnet 3.7
LB · Apr 1, 2025
67.1%
#5 · GPT-4.5 Preview
LB · Apr 3, 2025
65.3%
#6 · Grok 3
LB · Mar 18, 2025
62.3%
#7 · Gemini Experimental
LB · Feb 6, 2025
62%
#8 · Claude Sonnet 3.5
LB · Feb 6, 2025
61.7%
#9 · GPT-4o
LB · Apr 1, 2025
59.5%
#10 · DeepSeek Reasoner
LB · Apr 3, 2025
59.2%
#11 · Claude Opus 3
LB · Feb 6, 2025
57.3%
#12 · Gemini 2.0 Pro Experimental
LB · Apr 1, 2025
57.1%
#13 · o3 mini
LB · Feb 6, 2025
56.7%
#14 · o1 mini
LB · Feb 6, 2025
56.1%
#15 · Gemini 1.5 Pro
LB · Feb 6, 2025
55.5%
#16 · GPT-4
LB · Dec 10, 2024
55.1%
#17 · GPT-4 Turbo
LB · Dec 10, 2024
54.5%
#18 · Grok 2
LB · Feb 6, 2025
53.8%
#19 · Gemini 2.0 Flash
LB · Apr 7, 2025
51.7%
#20 · Grok Beta
LB · Feb 6, 2025
51.2%
#21 · Claude Haiku 3.5
LB · Feb 6, 2025
51%
#22 · Claude Haiku 4.5
LB · Feb 6, 2025
51%
#23 · Grok 2 mini
LB · Dec 10, 2024
49.8%
#24 · Gemini 1.5 Flash
LB · Dec 12, 2024
48.4%
#25 · GPT-4o mini
LB · Feb 6, 2025
47.4%
#26 · DeepSeek Chat
LB · Dec 11, 2024
46.4%
#27 · Gemini 2.0 Flash-Lite
LB · Apr 2, 2025
46.2%
#28 · Claude Sonnet 3
LB · Dec 10, 2024
44.5%
#29 · Grok 3 mini
LB · Mar 14, 2025
43.9%
#30 · Gemini 1.5 Flash 8B
LB · Apr 3, 2025
43.6%
#31 · Claude Haiku 3
LB · Dec 12, 2024
40.6%
#32 · GPT-3.5 Turbo
LB · Dec 10, 2024
35.5%