UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Document Arena
Document Arena
Live · updated continuously
Browse sectionsDocument Arena
Benchmarks · /benchmarks/arena-document

Document Arena

Blind preference arena for document-heavy prompts, PDFs, and long-form file understanding.
Source · Arena
Version · arena snapshot 2026-05-01
Scores · 19

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_document_2026_q2
domain
Document understanding

What it measures vs what it misses

✓ Measures

Observed user preference on document-grounded outputs. How often a model wins when users compare file-aware responses head to head.

✗ Misses

Objective document extraction accuracy. Fine-grained breakdowns across OCR, tables, charts, and legal or financial workflows.

Why this countsIt matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not stand in for end-to-end document workflow quality or search relevance.

Leaderboard · this benchmark version

#1 · Claude Opus 4.6
AR · May 1, 2026
1,520
#2 · Claude Opus 4.7
AR · May 1, 2026
1,511
#3 · Claude Sonnet 4.6
AR · May 1, 2026
1,500
#4 · GPT-5.5
AR · May 1, 2026
1,487
#5 · GPT-5.4
AR · May 1, 2026
1,480
#6 · Claude Opus 4.5
AR · May 1, 2026
1,470
#7 · Kimi K2.6
AR · May 1, 2026
1,457
#8 · muse-spark
AR · May 1, 2026
1,457
#9 · Claude Sonnet 4.5
AR · May 1, 2026
1,450
#10 · Gemini 3.1 Pro Preview
AR · May 1, 2026
1,449
#11 · kimi-k2.5-thinking
AR · May 1, 2026
1,444
#12 · Gemini 3 Pro Preview
AR · May 1, 2026
1,442
#13 · Gemma 4 31B
AR · May 1, 2026
1,432
#14 · Gemini 2.5 Pro
AR · May 1, 2026
1,429
#15 · Grok 4.20
AR · May 1, 2026
1,426
#16 · Claude Haiku 4.5
AR · May 1, 2026
1,424
#17 · Gemini 3 Flash
AR · May 1, 2026
1,421
#18 · GPT-5.1
AR · May 1, 2026
1,410
#19 · GPT-5.2
AR · May 1, 2026
1,406