UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.

Disagreement

Live · updated continuously

Home · source disagreements

Where the public record
splits, on purpose.

The product should make the public record easier to read, especially when independent sources disagree loudly.

Public sources · 9
Open disputes · 10
Goal · surface disagreement

Build / data stamp

Read this before trusting a headline.

Data snapshot May 1, 2026Registry verification passed9 providers · 826 tracked modelsPage refreshed May 7, 2026

If this stamp lags behind the repo, you are likely looking at an older build or cached deploy.

Disagreement view · spread desk

Rankings get worse more honest
when spread stays visible.

This view surfaces where public sources refuse to tell the same story. A wide spread is not noise to be sanded off. It is the main fact.

Open matrix Open sources

ArenaAR

LiveBenchLB

Artificial AnalysisAA

BridgeBenchBB

Terminal-BenchTERMINAL-BENCH

LLMBaseLLMBASE

Scale LabsSL

OpenCompassOC

MTEBMTEB

Where the public record splits

cross-benchmark spread = max percentile − min percentile

Claude Opus 4.7

Anthropic · 40.9%

0255075100

100%

cross-benchmark spread

Google · 56.8%

0255075100

100%

cross-benchmark spread

Google · 29.5%

0255075100

100%

cross-benchmark spread

OpenAI · 50%

0255075100

100%

cross-benchmark spread

OpenAI · 43.2%

0255075100

100%

cross-benchmark spread

OpenAI · 27.3%

0255075100

99.2%

cross-benchmark spread

DeepSeek Reasoner

DeepSeek · 27.3%

0255075100

96.8%

cross-benchmark spread

Source honesty scorecard

Not a moral rating. A quick check on how inspectable each source is when you need to dispute the headline number.

9 of 9 sources in the current registry

Benchmark and eval counts reflect what this app currently tracks for each source, not the source's full external catalog.


Arena verified	11	773	no	May 1, 2026	1
Artificial Analysis verified	7	690	yes	May 1, 2026	1
LiveBench verified	6	192	yes	May 1, 2026	1
Scale Labs verified	8	98	no	May 1, 2026	0
BridgeBench verified	5	52	no	May 1, 2026	1
Terminal-Bench verified	1	24	no	May 1, 2026	1
OpenCompass verified	1	15	no	May 1, 2026	0
MTEB verified	1	1	no	May 1, 2026	0
LLMBase relay	0	0	no	May 1, 2026	0