Benchmarks · /benchmarks/opencompass-multimodal

Multimodal mix

Broad multimodal slice from OpenCompass.

Source · OpenCompass
Version · opencompass snapshot 2026-05-01
Scores · 15

Passport

Thin verified coverageThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

OpenCompass

metric

Average score (%)

judge

Objective

direction

higher better

group id

opencompass_mm_2026_04

domain

Document understanding

What it measures vs what it misses

✓ Measures

Document reasoning and multimodal understanding.

✗ Misses

Arena-style human preference.

Why this countsIt matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not stand in for end-to-end document workflow quality or search relevance.

Leaderboard · this benchmark version

#1 · Gemini 2.5 Pro

OC · Apr 8, 2026

78.3%

#2 · GPT-5

OC · Apr 8, 2026

75.4%

#3 · GPT-5.4

OC · Apr 8, 2026

75.4%

#4 · GPT-5.4 mini

OC · Apr 8, 2026

75.4%

#5 · GPT-5.4 nano

OC · Apr 8, 2026

75.4%

#6 · Claude Opus 4.1

OC · Apr 8, 2026

74.2%

#7 · Claude Opus 4

OC · Apr 8, 2026

74.2%

#8 · Claude Opus 4.6

OC · Apr 8, 2026

74.2%

#9 · Claude Opus 4.7

OC · Apr 8, 2026

74.2%

#10 · Claude Sonnet 4

OC · Apr 8, 2026

72.7%

#11 · Claude Sonnet 4.5

OC · Apr 8, 2026

72.7%

#12 · Claude Sonnet 4.6

OC · Apr 8, 2026

72.7%

#13 · Qwen3 235B A22B

OC · Apr 8, 2026

70.8%

#14 · Grok 4

OC · Apr 8, 2026

68.1%

#15 · Llama 4 Maverick

OC · Apr 8, 2026

67.4%