UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Sources/Terminal-Bench
Terminal-Bench
Live · updated continuously
Browse sectionsTerminal-Bench
TERMINAL-BENCH · benchmark platform

Terminal-Bench

Agent benchmark for hard, realistic multi-step tasks completed inside terminal environments.
verification status
verified
Last checked May 1, 2026

Evidence ledger

ModalitiescodeCadencerelease-basedAPInot publicEvaluations24VerificationverifiedVerified runtime21Manual verified0Relay / mirrored0Backfilled3

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot
Latest pull

May 1, 2026

json
parser
Loaded 21 Terminal-Bench 2.0 benchmark records from verified rows.

0.1.0

ok
verify
terminal-bench verification finished with status verified.

May 1, 2026

verified
open
terminal-bench contains 7 unmapped model labels.

May 1, 2026

model_alias

Benchmarks from this source

Terminal-Bench 2.0
Agentic terminal coding
Accuracy

Latest change explanation

terminal-bench changed versus terminal-bench-20260501T202649Z with parser_diff, benchmark_movement causes.

  • Parser output changed: The parser metadata or warnings shifted relative to the previous run.
  • Benchmark coverage or values moved: 1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date.