Benchmarks · /benchmarks/arena-search

Search Arena

Blind preference arena for search-grounded answers and web-connected browsing behavior.

Source · Arena
Version · arena snapshot 2026-05-01
Scores · 28

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.

source

Arena

metric

Arena rating (rating)

judge

Human

direction

higher better

group id

arena_search_2026_q2

domain

Search / tool use

What it measures vs what it misses

✓ Measures

Observed user preference when models answer with live search or grounding behavior in the loop. How often a search-enabled response wins in head-to-head comparisons on web-aware tasks.

✗ Misses

Ground-truth retrieval accuracy or citation faithfulness in a fully objective sense. Tool-trace quality, latency, and hidden search-stack differences between providers.

Why this countsIt matters when the model must browse, call tools, and recover useful answers from external systems.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture production agent orchestration, cost ceilings, or safety policy behavior.

Leaderboard · this benchmark version

#1 · Claude Opus 4.6

AR · May 1, 2026

1,255

#2 · GPT-5.5

AR · May 1, 2026

1,235

#3 · Claude Opus 4.7

AR · May 1, 2026

1,233

#4 · Claude Sonnet 4.6

AR · May 1, 2026

1,221

#5 · Gemini 3.1 Pro

AR · May 1, 2026

1,218

#6 · Gemini 3 Pro Preview

AR · May 1, 2026

1,210

#7 · Gemini 3 Flash

AR · May 1, 2026

1,208

#8 · Grok 4.3

AR · May 1, 2026

1,205

#9 · Grok 4.20

AR · May 1, 2026

1,203

#10 · GPT-5.1

AR · May 1, 2026

1,201

#11 · GPT-5.4

AR · May 1, 2026

1,201

#12 · Claude Opus 4.5

AR · May 1, 2026

1,180

#13 · GPT-5.2

AR · May 1, 2026

1,178

#14 · Grok 4.1 Fast

AR · May 1, 2026

1,176

#15 · Grok 4 Fast

AR · May 1, 2026

1,172

#16 · Claude Sonnet 4.5

AR · May 1, 2026

1,152

#17 · Claude Opus 4.1

AR · May 1, 2026

1,145

#18 · o3

AR · May 1, 2026

1,144

#19 · Gemini 2.5 Pro

AR · May 1, 2026

1,143

#20 · Grok 4

AR · May 1, 2026

1,142

#21 · ppl-sonar-reasoning-pro-high

AR · May 1, 2026

1,139

#22 · GPT-5

AR · May 1, 2026

1,133

#23 · GPT-5.4 mini

AR · May 1, 2026

1,133

#24 · GPT-5.4 nano

AR · May 1, 2026

1,133

#25 · ppl-sonar-pro-high

AR · May 1, 2026

1,131

#26 · Claude Opus 4

AR · May 1, 2026

1,128

#27 · diffbot-small-xl

AR · May 1, 2026

1,024

#28 · GPT-4o

AR · May 1, 2026

1,006