Coverage

Coverage gaps and unresolved model mapping pressure.

This view turns hidden benchmark blind spots into an explicit maintenance surface: source health, unresolved labels, and tracked models still missing usable coverage.

Sources · 9
Blocked · 0
Missing model/preset cases · 112

Source Coverage

Source	Status	Benchmarks	Domains	Open reviews
Arena	verified	11	8	1
LiveBench	verified	4	4	1
Artificial Analysis	verified	7	5	1
BridgeBench	verified	5	2	1
Terminal-Bench	verified	1	1	1
LLMBase	relay	1	1	0
Scale Labs	verified	8	5	0
OpenCompass	verified	1	1	0
MTEB	verified	1	1	0

Tracked But Missing Coverage

Preset	Model	Exact	Indirect	Visible	Gap
Everyday chatbot	Llama 4 Maverick	67%	0%	67%	Missing benchmark coverage in Long context.
Everyday chatbot	Qwen3 235B A22B	67%	0%	67%	Missing benchmark coverage in Long context.
Everyday chatbot	Claude Haiku 4.5	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbot	DeepSeek Chat	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbot	DeepSeek V3 (Dec)	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	DeepSeek V3 0324	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	DeepSeek V3.1	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	DeepSeek V3.1 Terminus	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	DeepSeek V3.2 Exp	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Everyday chatbot	DeepSeek V4 Flash (Max)	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	DeepSeek V4 Pro (Max)	33%	33%	33%	Only indirect or proxy evidence is currently available.
Everyday chatbot	Gemini 2.5 Flash	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Grok 4.20	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Claude Haiku 4.5	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	DeepSeek Chat	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	DeepSeek V3.2 Exp	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Gemini 3 Flash	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	GPT-5.4 mini	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Grok 4 Fast	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Grok 4.1 Fast	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Grok 4.3	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Mistral Large 3	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Qwen3-Coder 480B A35B	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Coding copilot	Qwen3.5 122B A10B	33%	0%	33%	Missing benchmark coverage in Reasoning / math / science, Long context.
Research assistant	Gemini 3 Flash	40%	0%	40%	Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval.
Research assistant	Grok 4.20	40%	0%	40%	Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval.
Research assistant	Llama 4 Maverick	40%	0%	40%	Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use.
Research assistant	Qwen3 235B A22B	40%	0%	40%	Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use.
Research assistant	Claude Haiku 4.5	20%	0%	20%	Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval, Search / tool use.
Research assistant	Gemini 3.1 Flash-Lite Preview	20%	0%	20%	Missing benchmark coverage in Long context, Document understanding, Embeddings / retrieval, Search / tool use.
Research assistant	GPT-4o	20%	0%	20%	Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistant	Grok 4 Fast	20%	0%	20%	Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistant	Grok 4.1 Fast	20%	0%	20%	Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistant	Grok 4.3	20%	0%	20%	Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval.
Research assistant	DeepSeek Reasoner	0%	20%	20%	No exact-match benchmark rows are currently visible for this preset.
Research assistant	diffbot-small-xl	0%	20%	20%	No exact-match benchmark rows are currently visible for this preset.
Retrieval	Codestral Embed	0%	0%	0%	No exact-match benchmark rows are currently visible for this preset.
Retrieval	Gemini Embedding 001	0%	0%	0%	No exact-match benchmark rows are currently visible for this preset.
Retrieval	Gemini Embedding 2	0%	0%	0%	No exact-match benchmark rows are currently visible for this preset.
Retrieval	Mistral Embed	0%	0%	0%	No exact-match benchmark rows are currently visible for this preset.