UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Disagreement
Disagreement
Live · updated continuously
Home · source disagreements

Where the public record
splits, on purpose.

The product should make the public record easier to read, especially when independent sources disagree loudly.
Public sources · 9
Open disputes · 10
Goal · surface disagreement
Build / data stamp

Read this before trusting a headline.

Data snapshot May 1, 2026Registry verification passed9 providers · 826 tracked modelsPage refreshed May 7, 2026

If this stamp lags behind the repo, you are likely looking at an older build or cached deploy.

Disagreement view · spread desk

Rankings get worse more honest
when spread stays visible.

This view surfaces where public sources refuse to tell the same story. A wide spread is not noise to be sanded off. It is the main fact.

ArenaAR
LiveBenchLB
Artificial AnalysisAA
BridgeBenchBB
Terminal-BenchTERMINAL-BENCH
LLMBaseLLMBASE
Scale LabsSL
OpenCompassOC
MTEBMTEB

Where the public record splits

cross-benchmark spread = max percentile − min percentile

Source honesty scorecard

Not a moral rating. A quick check on how inspectable each source is when you need to dispute the headline number.

9 of 9 sources in the current registry
Benchmark and eval counts reflect what this app currently tracks for each source, not the source's full external catalog.
Arena
verified
11
773
noMay 1, 20261
Artificial Analysis
verified
7
690
yesMay 1, 20261
LiveBench
verified
6
192
yesMay 1, 20261
Scale Labs
verified
8
98
noMay 1, 20260
BridgeBench
verified
5
52
noMay 1, 20261
Terminal-Bench
verified
1
24
noMay 1, 20261
OpenCompass
verified
1
15
noMay 1, 20260
MTEB
verified
1
1
noMay 1, 20260
LLMBase
relay
0
0
noMay 1, 20260