UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Sources/Artificial Analysis
Artificial Analysis
Live · updated continuously
Browse sectionsArtificial Analysis
AA · meta aggregator

Artificial Analysis

Independent benchmark, speed, and price tracking across text and multimodal systems.
verification status
verified
Last checked May 1, 2026

Evidence ledger

Modalitiestext, image, audio, videoCadencecontinuousAPIavailableEvaluations690VerificationverifiedVerified runtime682Manual verified5Relay / mirrored0Backfilled3

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot
Latest pull

May 1, 2026

json
parser
Parsed 808 Artificial Analysis records across 298 page-backed models and 94 multimodal leaderboard models.

0.5.0

ok
verify
artificial-analysis verification finished with status verified.

May 1, 2026

verified
open
Artificial Analysis does not expose promotion-ready model pages for some tracked models under the current exact discovery rules: 24 no sitemap-discovered model page.

May 1, 2026

benchmark_mapping

Benchmarks from this source

Intelligence Index
Composite intelligence
Index
Long Context Reasoning
Long-document reasoning
Score
Text to Image
Blind image generation preference
Arena rating
Image Editing
Blind image editing preference
Arena rating
Text to Video
Blind video generation preference
Arena rating
Image to Video
Blind image-to-video preference
Arena rating

Latest change explanation

artificial-analysis changed versus artificial-analysis-20260501T202653Z with source_snapshot, parser_diff, mapping, benchmark_movement causes.

  • Source snapshot changed: The saved raw source snapshot changed relative to the previous run.
  • Parser output changed: The parser metadata or warnings shifted relative to the previous run.
  • Mapping or review queue changed: Mapping-related signals changed across 2 fields.
  • Benchmark coverage or values moved: 75 benchmark rows were added, 0 removed, and 20 existing rows changed value or evaluation date.