Coverage
Coverage gaps and unresolved model mapping pressure.
This view turns hidden benchmark blind spots into an explicit maintenance surface: source health, unresolved labels, and tracked models still missing usable coverage.
Source Coverage
| Source | Status | Benchmarks | Domains | Open reviews | Blocking |
|---|---|---|---|---|---|
| Arena | verified | 11 | 8 | 1 | 0 |
| LiveBench | verified | 4 | 4 | 1 | 0 |
| Artificial Analysis | verified | 7 | 5 | 1 | 0 |
| BridgeBench | verified | 5 | 2 | 1 | 0 |
| Terminal-Bench | verified | 1 | 1 | 1 | 0 |
| LLMBase | relay | 1 | 1 | 0 | 0 |
| Scale Labs | verified | 8 | 5 | 0 | 0 |
| OpenCompass | verified | 1 | 1 | 0 | 0 |
| MTEB | verified | 1 | 1 | 0 | 0 |
Tracked But Missing Coverage
| Preset | Model | Exact | Indirect | Visible | Gap |
|---|---|---|---|---|---|
| Everyday chatbot | Llama 4 Maverick | 67% | 0% | 67% | Missing benchmark coverage in Long context. |
| Everyday chatbot | Qwen3 235B A22B | 67% | 0% | 67% | Missing benchmark coverage in Long context. |
| Everyday chatbot | Claude Haiku 4.5 | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Everyday chatbot | DeepSeek Chat | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Everyday chatbot | DeepSeek V3 (Dec) | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | DeepSeek V3 0324 | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | DeepSeek V3.1 | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | DeepSeek V3.1 Terminus | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | DeepSeek V3.2 Exp | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Everyday chatbot | DeepSeek V4 Flash (Max) | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | DeepSeek V4 Pro (Max) | 33% | 33% | 33% | Only indirect or proxy evidence is currently available. |
| Everyday chatbot | Gemini 2.5 Flash | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Grok 4.20 | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Claude Haiku 4.5 | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | DeepSeek Chat | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | DeepSeek V3.2 Exp | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Gemini 3 Flash | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | GPT-5.4 mini | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Grok 4 Fast | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Grok 4.1 Fast | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Grok 4.3 | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Mistral Large 3 | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Qwen3-Coder 480B A35B | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Coding copilot | Qwen3.5 122B A10B | 33% | 0% | 33% | Missing benchmark coverage in Reasoning / math / science, Long context. |
| Research assistant | Gemini 3 Flash | 40% | 0% | 40% | Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval. |
| Research assistant | Grok 4.20 | 40% | 0% | 40% | Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval. |
| Research assistant | Llama 4 Maverick | 40% | 0% | 40% | Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use. |
| Research assistant | Qwen3 235B A22B | 40% | 0% | 40% | Missing benchmark coverage in Long context, Embeddings / retrieval, Search / tool use. |
| Research assistant | Claude Haiku 4.5 | 20% | 0% | 20% | Missing benchmark coverage in Reasoning / math / science, Long context, Embeddings / retrieval, Search / tool use. |
| Research assistant | Gemini 3.1 Flash-Lite Preview | 20% | 0% | 20% | Missing benchmark coverage in Long context, Document understanding, Embeddings / retrieval, Search / tool use. |
| Research assistant | GPT-4o | 20% | 0% | 20% | Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval. |
| Research assistant | Grok 4 Fast | 20% | 0% | 20% | Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval. |
| Research assistant | Grok 4.1 Fast | 20% | 0% | 20% | Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval. |
| Research assistant | Grok 4.3 | 20% | 0% | 20% | Missing benchmark coverage in Reasoning / math / science, Long context, Document understanding, Embeddings / retrieval. |
| Research assistant | DeepSeek Reasoner | 0% | 20% | 20% | No exact-match benchmark rows are currently visible for this preset. |
| Research assistant | diffbot-small-xl | 0% | 20% | 20% | No exact-match benchmark rows are currently visible for this preset. |
| Retrieval | Codestral Embed | 0% | 0% | 0% | No exact-match benchmark rows are currently visible for this preset. |
| Retrieval | Gemini Embedding 001 | 0% | 0% | 0% | No exact-match benchmark rows are currently visible for this preset. |
| Retrieval | Gemini Embedding 2 | 0% | 0% | 0% | No exact-match benchmark rows are currently visible for this preset. |
| Retrieval | Mistral Embed | 0% | 0% | 0% | No exact-match benchmark rows are currently visible for this preset. |