Why this countsIt checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not guarantee good synthesis quality once real documents, tools, and latency constraints are involved.