SL · benchmark platform

Scale Labs

Rubric-heavy frontier evals across agentic coding, visual-language understanding, spoken dialogue, tutoring, and hard reasoning.

verification status

verified

Last checked May 1, 2026

Evidence ledger

Modalitiestext, code, vision, audioCadencerelease-basedAPInot publicEvaluations98VerificationverifiedVerified runtime70Manual verified0Relay / mirrored0Backfilled28

Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.

Operational state

snapshot

Latest pull

May 1, 2026

json

parser

Loaded 70 verified benchmark records for scale-labs.

0.1.0

verify

scale-labs verification finished with status verified.

May 1, 2026

verified

Benchmarks from this source

EnigmaEval

Hard reasoning

Pass rate

VISTA

Vision-language understanding

Score

TutorBench

STEM tutoring quality

Score

VTB

Vision-language reasoning

APR

PRBench Legal

Professional legal reasoning

Score

HiL-Bench

Human-in-the-loop software tasks

Success rate

MASK

Hidden-goal honesty and safety

Honesty score

MultiNRC

Multilingual reasoning

Score

Latest change explanation

scale-labs matched scale-labs-20260501T202649Z with no notable change causes detected.