LB · benchmark platform
LiveBench
Contamination-aware, objective benchmark with recent questions and verifiable answers.
verification status
verified
Last checked May 1, 2026
Evidence ledger
Modalitiestext, codeCadencerelease-basedAPIavailableEvaluations192VerificationverifiedVerified runtime186Manual verified0Relay / mirrored0Backfilled6
Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.
Operational state
snapshot
Latest pull
parquetMay 1, 2026
parser
Fetched 60372 LiveBench judgment rows, deduped to 59388 latest question-model pairs, and mapped 186 benchmark records into the app catalog.
ok0.4.0
verify
livebench verification finished with status verified.
verifiedMay 1, 2026
open
LiveBench exposed leaderboard models that are not yet mapped into the canonical registry: command-r-08-2024 (494), command-r-plus-08-2024 (494), gemma-2-27b-it (494), step-2-16k-202411 (493), amazon.nova-lite-v1:0 (418), amazon.nova-micro-v1:0 (418), amazon.nova-pro-v1:0 (418), azerogpt (418), deepseek-r1-distill-llama-70b (418), deepseek-r1-distill-qwen-32b (418)
model_aliasApr 7, 2025
Benchmarks from this source
Coding
Objective coding
Score
Reasoning
Objective reasoning
Score
Language
Objective language puzzles
Score
Instruction following
Objective instruction following
Score
Coding generation
Objective code generation
Score
Coding completion
Objective code completion
Score
Latest change explanation
livebench matched livebench-20260501T202619Z with no notable change causes detected.