UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/VTB
VTB
Live · updated continuously
Browse sectionsVTB
Benchmarks · /benchmarks/scale-vtb

VTB

Scale VTB leaderboard for image-grounded reasoning and multimodal understanding.
Source · Scale Labs
Version · scale-labs snapshot 2026-05-01
Scores · 12

Passport

Visible tradeoffsThis is a rubric-judged signal, so it is more structured than arena taste but still depends on the scoring rubric.
source
Scale Labs
metric
APR (%)
judge
Rubric
direction
higher better
group id
scale_vtb_current
domain
Vision understanding

What it measures vs what it misses

✓ Measures

Visual interpretation and reasoning over benchmark images and prompts.

✗ Misses

Image generation quality. Tool-use orchestration beyond the judged task.

Why this countsIt is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the model can generate or edit images well.

Leaderboard · this benchmark version

#1 · GPT-5.4
SL · Apr 29, 2026
29.2%
#2 · Gemini 3.1 Pro
SL · Apr 29, 2026
29%
#3 · Claude Opus 4.6
SL · Apr 29, 2026
27.5%
#4 · Gemini 3 Pro Preview
SL · Apr 29, 2026
26.9%
#5 · GPT-5
SL · Apr 29, 2026
17%
#6 · GPT-5.4 mini
SL · Apr 29, 2026
17%
#7 · GPT-5.4 nano
SL · Apr 29, 2026
17%
#8 · o3
SL · Apr 29, 2026
13.7%
#9 · Gemini 2.5 Pro
SL · Apr 29, 2026
11.8%
#10 · Claude Sonnet 4
SL · Apr 29, 2026
4.5%
#11 · Claude Sonnet 4.5
SL · Apr 29, 2026
4.5%
#12 · Claude Sonnet 4.6
SL · Apr 29, 2026
4.5%