Thin verified coverageThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
OpenCompass
metric
Average score (%)
judge
Objective
direction
higher better
group id
opencompass_mm_2026_04
domain
Document understanding
What it measures vs what it misses
✓ Measures
Document reasoning and multimodal understanding.
✗ Misses
Arena-style human preference.
Why this countsIt matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not stand in for end-to-end document workflow quality or search relevance.