source
Arena
Observed user preference on document-grounded outputs. How often a model wins when users compare file-aware responses head to head.
Objective document extraction accuracy. Fine-grained breakdowns across OCR, tables, charts, and legal or financial workflows.