Visible tradeoffsThis is a retrieval signal, so it is best read as search-stack quality rather than broad model capability.
source
MTEB
metric
NDCG@10 (ndcg)
judge
Retrieval
direction
higher better
group id
mteb_retrieval_en_v2
domain
Embeddings / retrieval
What it measures vs what it misses
✓ Measures
Embedding quality for retrieval tasks.
✗ Misses
Chat quality, generation, latency.
Why this countsIt is one of the few direct signals for retrieval stacks, where embedding quality matters more than chat style.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the same model is strong at generation, ranking policy, or final answer quality.