UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Sources/LiveBench
LiveBench
Live · updated continuously
Browse sectionsLiveBench
LB · benchmark platform

LiveBench

Contamination-aware, objective benchmark with recent questions and verifiable answers.
verification status
verified
Last checked May 1, 2026

Evidence ledger

Modalitiestext, codeCadencerelease-basedAPIavailableEvaluations192VerificationverifiedVerified runtime186Manual verified0Relay / mirrored0Backfilled6

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot
Latest pull

May 1, 2026

parquet
parser
Fetched 60372 LiveBench judgment rows, deduped to 59388 latest question-model pairs, and mapped 186 benchmark records into the app catalog.

0.4.0

ok
verify
livebench verification finished with status verified.

May 1, 2026

verified
open
LiveBench exposed leaderboard models that are not yet mapped into the canonical registry: command-r-08-2024 (494), command-r-plus-08-2024 (494), gemma-2-27b-it (494), step-2-16k-202411 (493), amazon.nova-lite-v1:0 (418), amazon.nova-micro-v1:0 (418), amazon.nova-pro-v1:0 (418), azerogpt (418), deepseek-r1-distill-llama-70b (418), deepseek-r1-distill-qwen-32b (418)

Apr 7, 2025

model_alias

Benchmarks from this source

Coding
Objective coding
Score
Reasoning
Objective reasoning
Score
Language
Objective language puzzles
Score
Instruction following
Objective instruction following
Score
Coding generation
Objective code generation
Score
Coding completion
Objective code completion
Score

Latest change explanation

livebench matched livebench-20260501T202619Z with no notable change causes detected.