Verified but agingThis is an efficiency signal, so it belongs beside quality rather than being mistaken for quality.
source
BridgeBench
metric
TTFT (ms)
judge
Speed / cost
direction
lower better
group id
bridgebench_ttft_2026_04
domain
Coding
What it measures vs what it misses
✓ Measures
Initial coding-response latency before tokens begin streaming.
✗ Misses
Final answer quality. Total task completion time.
Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.