Home · editorial front

A more literate
interface for AI benchmarks.

The attached design included an editorial front. This route keeps it as a first-class reading mode instead of collapsing everything into one dashboard.

Surface · issue view
Benchmarks · 40
Models · 795

Build / data stamp

Read this before trusting a headline.

Data snapshot May 1, 2026Registry verification passed9 providers · 826 tracked modelsPage refreshed May 7, 2026

If this stamp lags behind the repo, you are likely looking at an older build or cached deploy.

Editorial front

Public AI rankings need
a more literate interface.

The point is not to crown one model. The point is to read the record: what was measured, by whom, under which judge, against which comparable group, and how stale the receipt already is.

Current surface · 40 benchmarks · 795 tracked models

Read methodology Open compare

Operating rules

Every score links back to a source page or benchmark receipt.Bias becomes easier to inspect when the system refuses to flatten unlike things together.

Percentiles only exist inside exact comparable groups.Bias becomes easier to inspect when the system refuses to flatten unlike things together.

Coverage gaps stay visible instead of being quietly filled in.Bias becomes easier to inspect when the system refuses to flatten unlike things together.

Parser anomalies and mapping fixes stay in the changelog.Bias becomes easier to inspect when the system refuses to flatten unlike things together.

Chat leaders

current

Gemini 3.1 Pro Preview

Google

AA · May 1, 2026 · aggregate score 92.4 across 2 chat receipts.

Open model

Coding leaders

current

GPT-5.5

OpenAI

#1GPT-5.574.6%

#2DeepSeek Reasoner73.8%

#3Gemini 2.0 Pro Experimental70.9%

SL · Apr 29, 2026 · aggregate score 74.6 across 7 coding receipts.

Open compare

Freshest source

ops

Artificial Analysis

May 1, 2026

Parsed 808 Artificial Analysis records across 298 page-backed models and 94 multimodal leaderboard models.

Open source

A leaderboard without its measurement context is just a stronger-looking opinion. This product keeps the context on the page.

Method

Why percentiles only exist inside exact comparable groups

We normalize only when the underlying unit, judge, and benchmark version actually line up.

Read methodology →

Compare

Head-to-head beats universal ranking when the surface is uneven

Comparisons stay grounded in shared coverage, raw values, and visible gaps instead of a universal scalar.

Open compare →

Operations

Changelog entries matter because data plumbing changes outcomes

Parser fixes, mapping corrections, and source updates change what appears true. They need their own paper trail.

Open changelog →

A more literateinterface for AI benchmarks.

Read this before trusting a headline.

Public AI rankings needa more literate interface.

Operating rules

Chat leaders

Coding leaders

Freshest source

Why percentiles only exist inside exact comparable groups

Head-to-head beats universal ranking when the surface is uneven

Changelog entries matter because data plumbing changes outcomes

A more literate
interface for AI benchmarks.

Public AI rankings need
a more literate interface.