UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Terminal-Bench 2.0
Terminal-Bench 2.0
Live · updated continuously
Benchmarks · /benchmarks/terminal-bench-2

Terminal-Bench 2.0

Official Terminal-Bench 2.0 leaderboard for realistic multi-step terminal tasks.
Source · Terminal-Bench
Version · terminal-bench snapshot 2026-05-01
Scores · 24

Passport

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Terminal-Bench
metric
Accuracy (%)
judge
Objective
direction
higher better
group id
terminal_bench_2_live
domain
Coding

What it measures vs what it misses

✓ Measures

End-to-end task success on hard terminal workflows that require planning, editing, debugging, and execution.

✗ Misses

IDE-native workflows, code review quality, and non-terminal product engineering work.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · GPT-5.5
TERMINAL-BENCH · Apr 23, 2026
82%
#2 · GPT-5.3 Codex
TERMINAL-BENCH · Feb 6, 2026
75.1%
#3 · Gemini 3 Pro Preview
TERMINAL-BENCH · Jan 6, 2026
69.4%
#4 · Claude Opus 4.6
TERMINAL-BENCH · Feb 6, 2026
62.9%
#5 · GPT-5.2
TERMINAL-BENCH · Dec 18, 2025
62.9%
#6 · GPT-5.1
TERMINAL-BENCH · Nov 24, 2025
60.4%
#7 · Claude Opus 4.5
TERMINAL-BENCH · Nov 22, 2025
57.8%
#8 · Gemini 3 Flash Preview
TERMINAL-BENCH · Jan 7, 2026
51.7%
#9 · GPT-5
TERMINAL-BENCH · Nov 4, 2025
49.6%
#10 · GPT-5.4
TERMINAL-BENCH · Nov 4, 2025
49.6%
#11 · Claude Sonnet 4.5
TERMINAL-BENCH · Oct 31, 2025
42.8%
#12 · Claude Opus 4.1
TERMINAL-BENCH · Oct 31, 2025
38%
#13 · Claude Opus 4
TERMINAL-BENCH · Oct 31, 2025
38%
#14 · Claude Opus 4.7
TERMINAL-BENCH · Oct 31, 2025
38%
#15 · Gemini 2.5 Pro
TERMINAL-BENCH · Oct 31, 2025
32.6%
#16 · GPT-5.4 mini
TERMINAL-BENCH · Nov 4, 2025
31.9%
#17 · Claude Haiku 4.5
TERMINAL-BENCH · Nov 3, 2025
29.8%
#18 · Grok 4
TERMINAL-BENCH · Nov 2, 2025
27.2%
#19 · Grok Code Fast
TERMINAL-BENCH · Nov 3, 2025
25.8%
#20 · Qwen3-Coder 480B A35B
TERMINAL-BENCH · Nov 2, 2025
25.4%
#21 · GPT-OSS 120B
TERMINAL-BENCH · Nov 1, 2025
18.7%
#22 · Gemini 2.5 Flash
TERMINAL-BENCH · Nov 3, 2025
17.1%
#23 · GPT-5.4 nano
TERMINAL-BENCH · Nov 4, 2025
11.5%
#24 · GPT-OSS 20B
TERMINAL-BENCH · Nov 3, 2025
3.4%