Benchmarks · /benchmarks/scale-prbench-legal

PRBench Legal

Scale professional reasoning benchmark for legal analysis and review work.

Source · Scale Labs
Version · scale-labs snapshot 2026-05-01
Scores · 13

Passport

Visible tradeoffsThis is a rubric-judged signal, so it is more structured than arena taste but still depends on the scoring rubric.

source

Scale Labs

metric

Score (%)

judge

Rubric

direction

higher better

group id

scale_prbench_legal_current

domain

Professional reasoning

What it measures vs what it misses

✓ Measures

Applied legal reasoning on professional-domain tasks.

✗ Misses

Broad chat preference. Search freshness or real-time retrieval quality.

Why this countsApplied legal reasoning on professional-domain tasks.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesBroad chat preference.

Leaderboard · this benchmark version

#1 · Claude Opus 4.6

SL · Apr 29, 2026

52.3%

#2 · GPT-5.1

SL · Apr 29, 2026

49.3%

#3 · GPT-5

SL · Apr 29, 2026

49%

#4 · GPT-5.4 mini

SL · Apr 29, 2026

49%

#5 · GPT-5.4 nano

SL · Apr 29, 2026

49%

#6 · o3

SL · Apr 29, 2026

48.6%

#7 · GPT-5.4

SL · Apr 29, 2026

44.4%

#8 · Gemini 3.1 Pro

SL · Apr 29, 2026

44%

#9 · Gemini 2.5 Pro

SL · Apr 29, 2026

41.4%

#10 · Gemini 2.5 Flash

SL · Apr 29, 2026

41%

#11 · Gemini 3 Pro Preview

SL · Apr 29, 2026

40.6%

#12 · GPT-OSS 120B

SL · Apr 29, 2026

40.2%

#13 · DeepSeek Reasoner

SL · Apr 29, 2026

36.6%