Operational history

Changelog

Parser changes, mapping fixes, methodology changes, and product releases stay visible because data plumbing changes what the site appears to know.

Entries · 7
Categories · parser / mapping / product / methodology

What changed this week

alert

10 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

models

Arena moved via real benchmark movement

8 benchmark rows were added, 0 removed, and 118 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:09Z -> 2026-05-01T22:04:34Z.

models

Artificial Analysis moved via real benchmark movement

75 benchmark rows were added, 0 removed, and 20 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:53Z -> 2026-05-01T22:05:29Z.

models

MTEB moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:33Z -> 2026-05-01T22:04:57Z.

models

Terminal-Bench moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:49Z -> 2026-05-01T22:05:24Z.

product

Initial glass-box matrix release

Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

models

Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

models

Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

2026-04-16

product

Initial glass-box matrix release Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

2026-04-16

methodology

Methodology contract published Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

2026-04-15

mapping

Artificial Analysis ID rule adopted Stable model and creator IDs are now the preferred external identity keys when available.

2026-04-15

parser

BridgeBench parser fallback added Added alternate selectors for category headers after leaderboard markup drift.

2026-04-16

parser

LiveBench worker now feeds app bundle LiveBench records are now generated from the official public leaderboard dataset and merged into the catalog as a checked-in fragment with snapshot and parser metadata.

2026-04-16

mapping

Provider model registry added Current-model coverage now merges a generated registry sourced from official provider docs, with per-model verification receipts and a review queue for newly discovered names.

2026-04-16

methodology

Current models separated from historical benchmark identities New provider-verified variants such as GPT-5.4 and Claude Sonnet 4.5 now remain distinct from older benchmarked IDs so legacy scores are not silently relabeled as newer models.

What changed this week

alert

10 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

Open

models

Arena moved via real benchmark movement

8 benchmark rows were added, 0 removed, and 118 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:09Z -> 2026-05-01T22:04:34Z.

Evidence window: 2026-05-01T20:26:09Z -> 2026-05-01T22:04:34Z

Open artifact Model Benchmark Source

models

Artificial Analysis moved via real benchmark movement

75 benchmark rows were added, 0 removed, and 20 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:53Z -> 2026-05-01T22:05:29Z.

Evidence window: 2026-05-01T20:26:53Z -> 2026-05-01T22:05:29Z

Open artifact Model Benchmark Source

models

MTEB moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:33Z -> 2026-05-01T22:04:57Z.

Evidence window: 2026-05-01T20:26:33Z -> 2026-05-01T22:04:57Z

Open artifact Model Benchmark Source

models

Terminal-Bench moved via new benchmark coverage

1 benchmark rows were added, 0 removed, and 0 existing rows changed value or evaluation date. Window: 2026-05-01T20:26:49Z -> 2026-05-01T22:05:24Z.

Evidence window: 2026-05-01T20:26:49Z -> 2026-05-01T22:05:24Z

Open artifact Model Benchmark Source

product

Initial glass-box matrix release

Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

Evidence window: 2026-04-16

Open changelog

models

Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

Evidence window: 2026-04-16

models

Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

Evidence window: 2026-04-15

Watchlists

Followed items reopen from their canonical URL first. Bundle export still works, but the durable state is the href plus deterministic latest-delta links, not a rebuilt local compare preset.

Open workspaces

Loading watchlist state...

No watchlists yet. Follow a recommendation card or compare set.

Saved compare views

Loading saved compare views...

Save a compare workspace to keep a shortlist around.

Workspace bundle

Portable bundles stay link-native. Use them to preview a shared workspace, reopen the same compare URLs on another device, or import the snapshot without reconstructing intent from loose local fields.

Current workspace0 saved compare views · 0 watches · 0 pinned compare models

Preview or import a shared bundle