UABUnbiased AI BenchGlass box for model evals.
Every leaderboard, with receipts.
Home/Coverage
Coverage
Loading search
Live · updated continuously
Coverage

Coverage gaps and unresolved model mapping pressure.

This view turns hidden benchmark blind spots into an explicit maintenance surface: source health, unresolved labels, and tracked models still missing usable coverage.
Sources · 9
Blocked · 0
Missing model/preset cases · 112

Source Coverage

SourceStatusBenchmarksDomainsOpen reviewsBlocking
Arenaverified11810
LiveBenchverified4410
Artificial Analysisverified7510
BridgeBenchverified5210
Terminal-Benchverified1110
LLMBaserelay1100
Scale Labsverified8500
OpenCompassverified1100
MTEBverified1100

Tracked But Missing Coverage

PresetModelExactIndirectVisibleGap
Everyday chatbotLlama 4 Maverick67%0%67%Missing benchmark coverage in Long context.
Everyday chatbotQwen3 235B A22B67%0%67%Missing benchmark coverage in Long context.
Everyday chatbotClaude Haiku 4.533%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbotDeepSeek Chat33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbotDeepSeek V3 (Dec)33%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotDeepSeek V3 032433%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotDeepSeek V3.133%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotDeepSeek V3.1 Terminus33%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotDeepSeek V3.2 Exp33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbotDeepSeek V4 Flash (Max)33%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotDeepSeek V4 Pro (Max)33%33%33%Only indirect or proxy evidence is currently available.
Everyday chatbotGemini 2.5 Flash33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGrok 4.2033%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotClaude Haiku 4.533%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotDeepSeek Chat33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotDeepSeek V3.2 Exp33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGemini 3 Flash33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGPT-5.4 mini33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGrok 4 Fast33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGrok 4.1 Fast33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotGrok 4.333%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotMistral Large 333%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotQwen3-Coder 480B A35B33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilotQwen3.5 122B A10B33%0%33%Missing benchmark coverage in Reasoning / math / science, Long context.
Research assistantGemini 3 Flash40%0%40%Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval.
Research assistantGrok 4.2040%0%40%Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval.
Research assistantLlama 4 Maverick40%0%40%Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use.
Research assistantQwen3 235B A22B40%0%40%Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use.
Research assistantClaude Haiku 4.520%0%20%Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval, Search / tool use.
Research assistantGemini 3.1 Flash-Lite Preview20%0%20%Missing benchmark coverage in Long context, Document understanding, Embeddings / retrieval, Search / tool use.
Research assistantGPT-4o20%0%20%Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistantGrok 4 Fast20%0%20%Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistantGrok 4.1 Fast20%0%20%Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistantGrok 4.320%0%20%Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistantDeepSeek Reasoner0%20%20%No exact-match benchmark rows are currently visible for this preset.
Research assistantdiffbot-small-xl0%20%20%No exact-match benchmark rows are currently visible for this preset.
RetrievalCodestral Embed0%0%0%No exact-match benchmark rows are currently visible for this preset.
RetrievalGemini Embedding 0010%0%0%No exact-match benchmark rows are currently visible for this preset.
RetrievalGemini Embedding 20%0%0%No exact-match benchmark rows are currently visible for this preset.
RetrievalMistral Embed0%0%0%No exact-match benchmark rows are currently visible for this preset.