UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Benchmarks/Search Arena
Search Arena
Live · updated continuously
Benchmarks · /benchmarks/arena-search

Search Arena

Blind preference arena for search-grounded answers and web-connected browsing behavior.
Source · Arena
Version · arena snapshot 2026-05-01
Scores · 28

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_search_2026_q2
domain
Search / tool use

What it measures vs what it misses

✓ Measures

Observed user preference when models answer with live search or grounding behavior in the loop. How often a search-enabled response wins in head-to-head comparisons on web-aware tasks.

✗ Misses

Ground-truth retrieval accuracy or citation faithfulness in a fully objective sense. Tool-trace quality, latency, and hidden search-stack differences between providers.

Why this countsIt matters when the model must browse, call tools, and recover useful answers from external systems.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture production agent orchestration, cost ceilings, or safety policy behavior.

Leaderboard · this benchmark version

#1 · Claude Opus 4.6
AR · May 1, 2026
1,255
#2 · GPT-5.5
AR · May 1, 2026
1,235
#3 · Claude Opus 4.7
AR · May 1, 2026
1,233
#4 · Claude Sonnet 4.6
AR · May 1, 2026
1,221
#5 · Gemini 3.1 Pro
AR · May 1, 2026
1,218
#6 · Gemini 3 Pro Preview
AR · May 1, 2026
1,210
#7 · Gemini 3 Flash
AR · May 1, 2026
1,208
#8 · Grok 4.3
AR · May 1, 2026
1,205
#9 · Grok 4.20
AR · May 1, 2026
1,203
#10 · GPT-5.1
AR · May 1, 2026
1,201
#11 · GPT-5.4
AR · May 1, 2026
1,201
#12 · Claude Opus 4.5
AR · May 1, 2026
1,180
#13 · GPT-5.2
AR · May 1, 2026
1,178
#14 · Grok 4.1 Fast
AR · May 1, 2026
1,176
#15 · Grok 4 Fast
AR · May 1, 2026
1,172
#16 · Claude Sonnet 4.5
AR · May 1, 2026
1,152
#17 · Claude Opus 4.1
AR · May 1, 2026
1,145
#18 · o3
AR · May 1, 2026
1,144
#19 · Gemini 2.5 Pro
AR · May 1, 2026
1,143
#20 · Grok 4
AR · May 1, 2026
1,142
#21 · ppl-sonar-reasoning-pro-high
AR · May 1, 2026
1,139
#22 · GPT-5
AR · May 1, 2026
1,133
#23 · GPT-5.4 mini
AR · May 1, 2026
1,133
#24 · GPT-5.4 nano
AR · May 1, 2026
1,133
#25 · ppl-sonar-pro-high
AR · May 1, 2026
1,131
#26 · Claude Opus 4
AR · May 1, 2026
1,128
#27 · diffbot-small-xl
AR · May 1, 2026
1,024
#28 · GPT-4o
AR · May 1, 2026
1,006