LB · benchmark platform

LiveBench

Contamination-aware, objective benchmark with recent questions and verifiable answers.

verification status

verified

Last checked May 1, 2026

Evidence ledger

Modalitiestext, codeCadencerelease-basedAPIavailableEvaluations192VerificationverifiedVerified runtime186Manual verified0Relay / mirrored0Backfilled6

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot

Latest pull

May 1, 2026

parquet

parser

Fetched 60372 LiveBench judgment rows, deduped to 59388 latest question-model pairs, and mapped 186 benchmark records into the app catalog.

0.4.0

verify

livebench verification finished with status verified.

May 1, 2026

verified

open

LiveBench exposed leaderboard models that are not yet mapped into the canonical registry: command-r-08-2024 (494), command-r-plus-08-2024 (494), gemma-2-27b-it (494), step-2-16k-202411 (493), amazon.nova-lite-v1:0 (418), amazon.nova-micro-v1:0 (418), amazon.nova-pro-v1:0 (418), azerogpt (418), deepseek-r1-distill-llama-70b (418), deepseek-r1-distill-qwen-32b (418)

Apr 7, 2025

model_alias

Benchmarks from this source

Coding

Objective coding

Score

Reasoning

Objective reasoning

Score

Language

Objective language puzzles

Score

Instruction following

Objective instruction following

Score

Coding generation

Objective code generation

Score

Coding completion

Objective code completion

Score

Latest change explanation

livebench matched livebench-20260501T202619Z with no notable change causes detected.