Verified but agingThis is an efficiency signal, so it belongs beside quality rather than being mistaken for quality.
source
BridgeBench
metric
Throughput (t/s)
judge
Speed / cost
direction
higher better
group id
bridgebench_throughput_2026_04
domain
Coding
What it measures vs what it misses
✓ Measures
How quickly a model emits coding output once generation is underway.
✗ Misses
Output correctness. Editing quality.
Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.