source
Arena
Observed user preference when models answer with live search or grounding behavior in the loop. How often a search-enabled response wins in head-to-head comparisons on web-aware tasks.
Ground-truth retrieval accuracy or citation faithfulness in a fully objective sense. Tool-trace quality, latency, and hidden search-stack differences between providers.