UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Multimodal mix
Multimodal mix
Live · updated continuously
Browse sectionsMultimodal mix
Benchmarks · /benchmarks/opencompass-multimodal

Multimodal mix

Broad multimodal slice from OpenCompass.
Source · OpenCompass
Version · opencompass snapshot 2026-05-01
Scores · 15

Passport

Thin verified coverageThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
OpenCompass
metric
Average score (%)
judge
Objective
direction
higher better
group id
opencompass_mm_2026_04
domain
Document understanding

What it measures vs what it misses

✓ Measures

Document reasoning and multimodal understanding.

✗ Misses

Arena-style human preference.

Why this countsIt matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not stand in for end-to-end document workflow quality or search relevance.

Leaderboard · this benchmark version

#1 · Gemini 2.5 Pro
OC · Apr 8, 2026
78.3%
#2 · GPT-5
OC · Apr 8, 2026
75.4%
#3 · GPT-5.4
OC · Apr 8, 2026
75.4%
#4 · GPT-5.4 mini
OC · Apr 8, 2026
75.4%
#5 · GPT-5.4 nano
OC · Apr 8, 2026
75.4%
#6 · Claude Opus 4.1
OC · Apr 8, 2026
74.2%
#7 · Claude Opus 4
OC · Apr 8, 2026
74.2%
#8 · Claude Opus 4.6
OC · Apr 8, 2026
74.2%
#9 · Claude Opus 4.7
OC · Apr 8, 2026
74.2%
#10 · Claude Sonnet 4
OC · Apr 8, 2026
72.7%
#11 · Claude Sonnet 4.5
OC · Apr 8, 2026
72.7%
#12 · Claude Sonnet 4.6
OC · Apr 8, 2026
72.7%
#13 · Qwen3 235B A22B
OC · Apr 8, 2026
70.8%
#14 · Grok 4
OC · Apr 8, 2026
68.1%
#15 · Llama 4 Maverick
OC · Apr 8, 2026
67.4%