source
Arena
Perceived usefulness and polish on browser-facing coding tasks. How often a model's generated web experience wins in side-by-side judgments.
Objective correctness or runtime reliability. Accessibility, maintainability, and deploy-time quality unless voters notice them directly.