UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Workspaces
Workspaces
Live · updated continuously
Browse sectionsWorkspaces
Link-native durability

Workspaces

Saved compare views and follows stay portable as explicit bundle links. Reopening state should land on the same canonical compare or artifact URL, not a reconstructed local approximation.
Mode · bundle-first portability
Scope · saved compare views + follows
State · href-native reopening

Workspace bundle

Portable bundles stay link-native. Use them to preview a shared workspace, reopen the same compare URLs on another device, or import the snapshot without reconstructing intent from loose local fields.

Current workspace0 saved compare views · 0 watches · 0 pinned compare models
Preview or import a shared bundle

How it works

  • Copy a portable bundle link from the current device or workspace snapshot.
  • Open that URL anywhere to preview the exact saved compare and follow hrefs.
  • Import the bundle locally when you want the same workspaces and follows on that device.

What changed this week

alert
10 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

models
Arena moved via real benchmark movement

8 benchmark rows were added, 0 removed, and 118 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:09Z -> 2026-05-01T22:04:34Z.

Evidence window: 2026-05-01T20:26:09Z -> 2026-05-01T22:04:34Z

models
Artificial Analysis moved via real benchmark movement

75 benchmark rows were added, 0 removed, and 20 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:53Z -> 2026-05-01T22:05:29Z.

Evidence window: 2026-05-01T20:26:53Z -> 2026-05-01T22:05:29Z

models
MTEB moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:33Z -> 2026-05-01T22:04:57Z.

Evidence window: 2026-05-01T20:26:33Z -> 2026-05-01T22:04:57Z

models
Terminal-Bench moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:49Z -> 2026-05-01T22:05:24Z.

Evidence window: 2026-05-01T20:26:49Z -> 2026-05-01T22:05:24Z

product
Initial glass-box matrix release

Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

Evidence window: 2026-04-16

models
Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

Evidence window: 2026-04-16

models
Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

Evidence window: 2026-04-15