Real model outputs from the cold-output sheet, judged with full persona + history from Neon. 713 replies across 13 models.
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| gpt-5.4 ๐ | 99% | 99% | 91% | 100% | 100% | 100% | 100% | 97% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 223 | 3 |
| kimi-k2.6 | 98% | 100% | 91% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 442 | 1 |
| claude-opus-4-7 | 97% | 98% | 91% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 91 | 2 |
| gpt-5.5 | 97% | 92% | 91% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 163 | 1 |
| gemini-3.5-flash | 96% | 96% | 82% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 74 | 2 |
| grok-4.3 | 99% | 100% | 91% | 100% | 100% | 100% | 100% | 97% | 100% | 100% | 100% | 100% | 100% | 67% | 93% | 125 | 3 |
| gemini-3.1-flash-lite | 96% | 94% | 86% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 50% | 100% | 100% | 100% | 90% | 86 | 2 |
| claude-sonnet-4-6 | 99% | 100% | 91% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 80% | 45 | 1 |
| claude-opus-4-8 | 98% | 100% | 91% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 106 | 1 |
| deepseek-v4-flash | 98% | 100% | 91% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 120 | 2 |
| deepseek-v4-pro | 97% | 98% | 86% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 50% | 100% | 100% | 80% | 118 | 2 |
| nemotron-3-120b | 97% | 96% | 91% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 155 | 1 |
| claude-opus-4-6 | 98% | 100% | 91% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 0% | 100% | 100% | 60% | 43 | 1 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| deepseek-v4-flash ๐ | 96% | 98% | 100% | 95% | 96% | 100% | 100% | 88% | 98% | 50% | 100% | 100% | 100% | 100% | 90% | 260 | 2 |
| gpt-5.4 | 97% | 97% | 94% | 95% | 100% | 100% | 100% | 92% | 99% | 100% | 100% | 33% | 100% | 100% | 87% | 708 | 3 |
| claude-opus-4-6 | 97% | 96% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 100% | 100% | 100% | 100% | 80% | 202 | 1 |
| claude-opus-4-8 | 97% | 96% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 100% | 100% | 100% | 100% | 80% | 229 | 1 |
| deepseek-v4-pro | 97% | 98% | 100% | 95% | 100% | 100% | 100% | 92% | 98% | 50% | 100% | 50% | 100% | 100% | 80% | 546 | 2 |
| gpt-5.5 | 96% | 96% | 91% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 100% | 100% | 100% | 100% | 80% | 627 | 1 |
| kimi-k2.6 | 96% | 96% | 100% | 95% | 100% | 100% | 100% | 92% | 96% | 100% | 100% | 0% | 100% | 100% | 80% | 1624 | 1 |
| nemotron-3-120b | 96% | 96% | 100% | 95% | 100% | 100% | 100% | 92% | 96% | 0% | 100% | 100% | 100% | 100% | 80% | 438 | 1 |
| gemini-3.5-flash | 96% | 96% | 91% | 95% | 100% | 100% | 100% | 92% | 100% | 50% | 100% | 0% | 100% | 100% | 70% | 96 | 2 |
| gemini-3.1-flash-lite | 96% | 94% | 91% | 95% | 100% | 100% | 100% | 92% | 98% | 0% | 100% | 50% | 100% | 100% | 70% | 119 | 2 |
| claude-sonnet-4-6 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 100% | 0% | 100% | 100% | 60% | 254 | 1 |
| grok-4.3 | 97% | 100% | 100% | 95% | 97% | 100% | 100% | 90% | 97% | 0% | 100% | 0% | 100% | 100% | 60% | 133 | 3 |
| claude-opus-4-7 | 97% | 98% | 100% | 95% | 100% | 100% | 100% | 92% | 96% | 0% | 100% | 0% | 100% | 100% | 60% | 156 | 2 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-opus-4-8 ๐ | 96% | 96% | 100% | 89% | 100% | 100% | 100% | 92% | 96% | 100% | 100% | 100% | 100% | 100% | 100% | 267 | 1 |
| claude-opus-4-6 | 95% | 96% | 91% | 89% | 100% | 100% | 100% | 92% | 96% | 100% | 100% | 100% | 100% | 100% | 100% | 216 | 1 |
| grok-4.3 | 98% | 99% | 97% | 98% | 100% | 100% | 100% | 92% | 97% | 67% | 100% | 100% | 100% | 100% | 93% | 199 | 3 |
| gpt-5.4 | 97% | 99% | 97% | 91% | 100% | 100% | 100% | 92% | 99% | 67% | 100% | 100% | 100% | 100% | 93% | 434 | 3 |
| gemini-3.5-flash | 96% | 96% | 100% | 89% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 128 | 2 |
| deepseek-v4-flash | 96% | 98% | 95% | 95% | 100% | 100% | 100% | 92% | 94% | 100% | 100% | 50% | 100% | 100% | 90% | 326 | 2 |
| gpt-5.5 | 96% | 100% | 91% | 89% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 625 | 1 |
| claude-opus-4-7 | 96% | 98% | 95% | 92% | 100% | 100% | 100% | 92% | 96% | 100% | 0% | 50% | 100% | 100% | 70% | 174 | 2 |
| gemini-3.1-flash-lite | 96% | 96% | 91% | 92% | 100% | 100% | 100% | 92% | 98% | 0% | 50% | 100% | 100% | 100% | 70% | 150 | 2 |
| claude-sonnet-4-6 | 97% | 100% | 91% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 100% | 0% | 100% | 100% | 60% | 534 | 1 |
| deepseek-v4-pro | 97% | 98% | 95% | 92% | 100% | 100% | 100% | 92% | 100% | 0% | 0% | 100% | 100% | 100% | 60% | 355 | 2 |
| nemotron-3-120b | 97% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 96% | 0% | 0% | 0% | 100% | 100% | 40% | 572 | 1 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| gpt-5.5 ๐ | 94% | 88% | 100% | 95% | 100% | 100% | 100% | 92% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 507 | 1 |
| nemotron-3-120b | 99% | 100% | 100% | 95% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 1460 | 1 |
| claude-opus-4-6 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 50 | 1 |
| claude-opus-4-7 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 33 | 2 |
| claude-opus-4-8 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 100 | 1 |
| claude-sonnet-4-6 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 95 | 1 |
| gpt-5.4 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 374 | 3 |
| grok-4.3 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 75 | 3 |
| deepseek-v4-flash | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 0% | 0% | 100% | 100% | 100% | 60% | 74 | 2 |
| gemini-3.1-flash-lite | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 98% | 0% | 0% | 100% | 100% | 100% | 60% | 42 | 2 |
| deepseek-v4-pro | 97% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 96% | 0% | 0% | 100% | 100% | 100% | 60% | 86 | 2 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-opus-4-8 ๐ | 98% | 99% | 98% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 80% | 40% | 100% | 100% | 84% | 222 | 5 |
| gpt-5.5 | 98% | 96% | 97% | 100% | 100% | 100% | 100% | 94% | 100% | 50% | 70% | 80% | 100% | 100% | 80% | 294 | 10 |
| gpt-5.4 | 98% | 96% | 95% | 100% | 99% | 100% | 100% | 93% | 100% | 60% | 73% | 60% | 100% | 100% | 79% | 376 | 15 |
| claude-opus-4-7 | 99% | 99% | 99% | 100% | 100% | 100% | 100% | 92% | 100% | 70% | 30% | 80% | 100% | 90% | 74% | 123 | 10 |
| gemini-3.1-flash-lite | 98% | 96% | 95% | 100% | 100% | 100% | 100% | 93% | 99% | 50% | 80% | 40% | 100% | 100% | 74% | 100 | 10 |
| claude-sonnet-4-6 | 98% | 100% | 95% | 100% | 100% | 100% | 100% | 92% | 97% | 80% | 20% | 80% | 100% | 80% | 72% | 322 | 5 |
| gemini-3.5-flash | 98% | 95% | 95% | 100% | 100% | 100% | 100% | 92% | 100% | 90% | 30% | 50% | 100% | 90% | 72% | 83 | 10 |
| claude-opus-4-6 | 99% | 98% | 98% | 100% | 100% | 100% | 100% | 92% | 100% | 60% | 40% | 60% | 100% | 80% | 68% | 368 | 5 |
| deepseek-v4-pro | 98% | 100% | 98% | 100% | 100% | 100% | 100% | 92% | 98% | 40% | 40% | 60% | 100% | 100% | 68% | 153 | 10 |
| deepseek-v4-flash | 98% | 98% | 99% | 100% | 100% | 100% | 100% | 92% | 98% | 20% | 50% | 80% | 100% | 90% | 68% | 212 | 10 |
| nemotron-3-120b | 97% | 98% | 91% | 100% | 100% | 100% | 100% | 91% | 96% | 20% | 20% | 80% | 100% | 100% | 64% | 382 | 5 |
| kimi-k2.6 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 0% | 0% | 100% | 100% | 60% | 912 | 2 |
| grok-4.3 | 98% | 98% | 96% | 100% | 100% | 100% | 100% | 94% | 98% | 33% | 7% | 47% | 100% | 100% | 57% | 268 | 15 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-sonnet-4-6 ๐ | 98% | 98% | 95% | 99% | 100% | 100% | 100% | 92% | 98% | 80% | 60% | 100% | 100% | 100% | 88% | 705 | 5 |
| claude-opus-4-8 | 98% | 98% | 100% | 99% | 100% | 100% | 100% | 92% | 98% | 80% | 60% | 80% | 100% | 100% | 84% | 250 | 5 |
| gpt-5.5 | 97% | 95% | 95% | 99% | 99% | 100% | 100% | 93% | 100% | 70% | 50% | 80% | 100% | 100% | 80% | 416 | 10 |
| gpt-5.4 | 97% | 96% | 94% | 98% | 99% | 100% | 100% | 92% | 99% | 67% | 60% | 67% | 100% | 100% | 79% | 662 | 15 |
| claude-opus-4-6 | 98% | 98% | 96% | 99% | 100% | 100% | 100% | 92% | 98% | 40% | 40% | 80% | 100% | 100% | 72% | 325 | 5 |
| deepseek-v4-pro | 97% | 97% | 97% | 99% | 99% | 100% | 100% | 92% | 96% | 50% | 70% | 50% | 100% | 90% | 72% | 200 | 10 |
| kimi-k2.6 | 98% | 99% | 100% | 99% | 100% | 100% | 100% | 92% | 99% | 50% | 75% | 25% | 100% | 100% | 70% | 799 | 4 |
| claude-opus-4-7 | 98% | 99% | 100% | 99% | 98% | 100% | 100% | 91% | 98% | 20% | 50% | 80% | 100% | 100% | 70% | 128 | 10 |
| gemini-3.1-flash-lite | 97% | 98% | 96% | 97% | 100% | 100% | 100% | 92% | 98% | 30% | 80% | 50% | 100% | 90% | 70% | 105 | 10 |
| gemini-3.5-flash | 98% | 97% | 94% | 99% | 99% | 100% | 100% | 92% | 100% | 60% | 20% | 60% | 100% | 100% | 68% | 98 | 10 |
| grok-4.3 | 98% | 99% | 99% | 99% | 99% | 100% | 100% | 93% | 98% | 36% | 64% | 29% | 100% | 100% | 66% | 189 | 14 |
| deepseek-v4-flash | 97% | 97% | 98% | 99% | 100% | 100% | 100% | 92% | 96% | 70% | 40% | 20% | 100% | 100% | 66% | 168 | 10 |
| nemotron-3-120b | 96% | 97% | 93% | 97% | 98% | 100% | 100% | 90% | 97% | 0% | 50% | 75% | 100% | 75% | 60% | 401 | 4 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| kimi-k2.6 ๐ | 98% | 96% | 100% | 98% | 100% | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 480 | 5 |
| claude-opus-4-6 | 99% | 99% | 100% | 98% | 100% | 100% | 100% | 94% | 99% | 100% | 80% | 100% | 100% | 80% | 92% | 204 | 5 |
| claude-sonnet-4-6 | 98% | 98% | 100% | 98% | 100% | 100% | 100% | 94% | 100% | 100% | 80% | 80% | 100% | 100% | 92% | 294 | 5 |
| claude-opus-4-8 | 97% | 98% | 98% | 98% | 100% | 100% | 100% | 92% | 96% | 100% | 60% | 80% | 100% | 100% | 88% | 104 | 5 |
| deepseek-v4-pro | 98% | 98% | 96% | 98% | 100% | 100% | 100% | 93% | 99% | 70% | 90% | 70% | 100% | 100% | 86% | 260 | 10 |
| gpt-5.4 | 98% | 96% | 96% | 98% | 100% | 100% | 100% | 93% | 99% | 80% | 87% | 60% | 100% | 100% | 85% | 718 | 15 |
| gemini-3.1-flash-lite | 97% | 96% | 97% | 96% | 100% | 100% | 100% | 93% | 98% | 70% | 80% | 70% | 100% | 100% | 84% | 136 | 10 |
| gpt-5.5 | 97% | 96% | 96% | 98% | 98% | 100% | 100% | 92% | 99% | 70% | 90% | 50% | 100% | 100% | 82% | 432 | 10 |
| claude-opus-4-7 | 99% | 100% | 100% | 99% | 100% | 100% | 100% | 94% | 99% | 50% | 60% | 90% | 100% | 100% | 80% | 104 | 10 |
| gemini-3.5-flash | 97% | 96% | 96% | 97% | 99% | 100% | 100% | 94% | 98% | 70% | 50% | 80% | 100% | 100% | 80% | 105 | 10 |
| deepseek-v4-flash | 97% | 97% | 98% | 98% | 100% | 100% | 100% | 92% | 98% | 50% | 90% | 60% | 100% | 90% | 78% | 230 | 10 |
| grok-4.3 | 98% | 99% | 100% | 98% | 99% | 100% | 100% | 93% | 99% | 40% | 67% | 67% | 100% | 100% | 75% | 173 | 15 |
| nemotron-3-120b | 96% | 95% | 96% | 98% | 98% | 100% | 100% | 91% | 96% | 0% | 100% | 60% | 100% | 100% | 72% | 510 | 5 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-opus-4-7 ๐ | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 95% | 99% | 50% | 20% | 90% | 100% | 100% | 72% | 54 | 10 |
| gpt-5.5 | 98% | 98% | 97% | 97% | 99% | 100% | 100% | 94% | 100% | 60% | 30% | 70% | 100% | 100% | 72% | 361 | 10 |
| deepseek-v4-flash | 98% | 98% | 98% | 97% | 100% | 100% | 100% | 92% | 98% | 30% | 20% | 100% | 100% | 100% | 70% | 186 | 10 |
| claude-opus-4-8 | 98% | 100% | 98% | 96% | 100% | 100% | 100% | 92% | 99% | 60% | 0% | 80% | 100% | 100% | 68% | 84 | 5 |
| gemini-3.1-flash-lite | 97% | 98% | 98% | 96% | 99% | 100% | 100% | 92% | 98% | 40% | 20% | 80% | 100% | 100% | 68% | 84 | 10 |
| nemotron-3-120b | 97% | 98% | 96% | 97% | 98% | 100% | 100% | 92% | 97% | 20% | 20% | 100% | 100% | 100% | 68% | 493 | 5 |
| gpt-5.4 | 98% | 99% | 95% | 96% | 100% | 100% | 100% | 93% | 100% | 60% | 20% | 60% | 93% | 100% | 67% | 453 | 15 |
| claude-sonnet-4-6 | 98% | 99% | 98% | 97% | 100% | 100% | 100% | 92% | 98% | 40% | 20% | 60% | 100% | 100% | 64% | 288 | 5 |
| claude-opus-4-6 | 98% | 98% | 96% | 97% | 100% | 100% | 100% | 92% | 98% | 40% | 20% | 60% | 100% | 100% | 64% | 280 | 5 |
| deepseek-v4-pro | 98% | 99% | 98% | 96% | 99% | 100% | 100% | 92% | 98% | 30% | 20% | 80% | 90% | 100% | 64% | 184 | 10 |
| grok-4.3 | 99% | 99% | 100% | 97% | 99% | 100% | 100% | 95% | 99% | 27% | 13% | 73% | 100% | 100% | 63% | 125 | 15 |
| gemini-3.5-flash | 97% | 96% | 95% | 95% | 99% | 100% | 100% | 91% | 99% | 22% | 11% | 67% | 100% | 100% | 60% | 403 | 9 |
| kimi-k2.6 | 95% | 92% | 91% | 95% | 100% | 100% | 100% | 92% | 96% | 0% | 0% | 100% | 100% | 0% | 40% | 1862 | 1 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-opus-4-6 ๐ | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 244 | 2 |
| deepseek-v4-pro | 99% | 98% | 100% | 100% | 98% | 100% | 100% | 94% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 263 | 4 |
| deepseek-v4-flash | 98% | 99% | 100% | 100% | 100% | 100% | 100% | 92% | 96% | 100% | 100% | 100% | 100% | 75% | 95% | 148 | 4 |
| grok-4.3 | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 96% | 98% | 100% | 83% | 83% | 100% | 100% | 93% | 189 | 6 |
| claude-opus-4-8 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 100% | 50% | 100% | 100% | 90% | 206 | 2 |
| claude-opus-4-7 | 99% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 99% | 100% | 100% | 50% | 100% | 75% | 85% | 76 | 4 |
| gpt-5.5 | 99% | 97% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 100% | 50% | 100% | 75% | 85% | 233 | 4 |
| gemini-3.1-flash-lite | 98% | 97% | 100% | 100% | 100% | 100% | 100% | 92% | 99% | 100% | 100% | 75% | 75% | 75% | 85% | 80 | 4 |
| claude-sonnet-4-6 | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 50% | 50% | 100% | 100% | 80% | 252 | 2 |
| gpt-5.4 | 98% | 96% | 98% | 100% | 100% | 100% | 100% | 95% | 100% | 100% | 50% | 50% | 100% | 100% | 80% | 327 | 6 |
| gemini-3.5-flash | 98% | 95% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 75% | 50% | 75% | 75% | 75% | 61 | 4 |
| nemotron-3-120b | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 0% | 100% | 100% | 100% | 50% | 70% | 575 | 2 |
| kimi-k2.6 | 98% | 96% | 100% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 0% | 60% | 1615 | 1 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-sonnet-4-6 ๐ | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 410 | 2 |
| claude-opus-4-6 | 98% | 96% | 100% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 334 | 2 |
| gpt-5.5 | 98% | 97% | 98% | 100% | 100% | 100% | 100% | 92% | 100% | 100% | 75% | 100% | 100% | 100% | 95% | 545 | 4 |
| claude-opus-4-8 | 98% | 98% | 100% | 100% | 100% | 100% | 100% | 92% | 96% | 100% | 50% | 100% | 100% | 100% | 90% | 310 | 2 |
| deepseek-v4-flash | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 94% | 100% | 100% | 75% | 50% | 100% | 100% | 85% | 118 | 4 |
| claude-opus-4-7 | 98% | 97% | 100% | 100% | 98% | 100% | 100% | 92% | 98% | 100% | 50% | 100% | 100% | 75% | 85% | 129 | 4 |
| deepseek-v4-pro | 98% | 99% | 100% | 100% | 100% | 100% | 100% | 92% | 95% | 100% | 50% | 75% | 100% | 100% | 85% | 269 | 4 |
| kimi-k2.6 | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 50% | 100% | 100% | 50% | 80% | 828 | 2 |
| gpt-5.4 | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 94% | 99% | 100% | 83% | 50% | 100% | 67% | 80% | 675 | 6 |
| grok-4.3 | 99% | 99% | 100% | 99% | 100% | 100% | 100% | 92% | 99% | 67% | 50% | 83% | 100% | 100% | 80% | 229 | 6 |
| gemini-3.1-flash-lite | 98% | 95% | 100% | 100% | 100% | 100% | 100% | 92% | 98% | 100% | 50% | 100% | 75% | 75% | 80% | 86 | 4 |
| nemotron-3-120b | 99% | 98% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 100% | 0% | 0% | 100% | 100% | 60% | 502 | 2 |
| gemini-3.5-flash | 98% | 98% | 98% | 99% | 100% | 100% | 100% | 94% | 98% | 100% | 25% | 75% | 50% | 50% | 60% | 92 | 4 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| claude-opus-4-8 ๐ | 98% | 98% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 109 | 2 |
| claude-opus-4-6 | 97% | 98% | 100% | 97% | 100% | 100% | 100% | 92% | 96% | 100% | 50% | 100% | 100% | 100% | 90% | 496 | 2 |
| gpt-5.5 | 97% | 94% | 98% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 75% | 75% | 100% | 100% | 90% | 641 | 4 |
| gemini-3.1-flash-lite | 98% | 98% | 100% | 96% | 100% | 100% | 100% | 92% | 99% | 100% | 50% | 100% | 100% | 75% | 85% | 99 | 4 |
| gemini-3.5-flash | 98% | 97% | 100% | 96% | 98% | 100% | 100% | 94% | 99% | 100% | 75% | 100% | 75% | 75% | 85% | 91 | 4 |
| grok-4.3 | 98% | 99% | 100% | 96% | 100% | 100% | 100% | 92% | 99% | 67% | 67% | 83% | 100% | 100% | 83% | 186 | 6 |
| claude-opus-4-7 | 98% | 98% | 100% | 99% | 100% | 100% | 100% | 92% | 99% | 75% | 50% | 75% | 100% | 100% | 80% | 99 | 4 |
| deepseek-v4-flash | 98% | 96% | 100% | 97% | 100% | 100% | 100% | 92% | 99% | 100% | 50% | 75% | 100% | 75% | 80% | 162 | 4 |
| gpt-5.4 | 97% | 94% | 100% | 99% | 100% | 100% | 100% | 92% | 98% | 83% | 50% | 67% | 100% | 67% | 73% | 653 | 6 |
| nemotron-3-120b | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 100% | 50% | 50% | 50% | 100% | 100% | 70% | 164 | 2 |
| claude-sonnet-4-6 | 98% | 96% | 100% | 95% | 100% | 100% | 100% | 96% | 100% | 50% | 50% | 50% | 100% | 100% | 70% | 96 | 2 |
| deepseek-v4-pro | 97% | 96% | 98% | 97% | 100% | 100% | 100% | 92% | 98% | 50% | 50% | 50% | 100% | 75% | 65% | 266 | 4 |
| Deterministic (objective) | LLM-as-judge | Size | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | Det% | Tone | Mem-facts | Flow | Mechanics | Adapt | Eval-hyg | Format | Anti-AI | In-voice | Pushback | Q-fit | Mem-use | Grounded | Judge% | Out tok | n |
| nemotron-3-120b ๐ | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 96% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 1308 | 2 |
| claude-opus-4-6 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 64 | 2 |
| claude-opus-4-7 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 48 | 4 |
| claude-sonnet-4-6 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 67 | 2 |
| gpt-5.4 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 50% | 100% | 100% | 100% | 90% | 290 | 6 |
| gemini-3.1-flash-lite | 97% | 97% | 98% | 97% | 100% | 100% | 100% | 92% | 98% | 100% | 50% | 100% | 100% | 100% | 90% | 90 | 4 |
| deepseek-v4-pro | 98% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 99% | 75% | 50% | 100% | 100% | 100% | 85% | 110 | 4 |
| claude-opus-4-8 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 128 | 2 |
| gpt-5.5 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 100% | 50% | 50% | 100% | 100% | 100% | 80% | 340 | 4 |
| deepseek-v4-flash | 98% | 100% | 100% | 97% | 100% | 100% | 100% | 92% | 99% | 75% | 25% | 100% | 100% | 100% | 80% | 112 | 4 |
| kimi-k2.6 | 98% | 100% | 100% | 95% | 100% | 100% | 100% | 92% | 100% | 100% | 0% | 100% | 100% | 100% | 80% | 1315 | 1 |
| grok-4.3 | 99% | 100% | 100% | 97% | 100% | 100% | 100% | 94% | 100% | 33% | 50% | 100% | 100% | 100% | 77% | 45 | 6 |
| gemini-3.5-flash | 97% | 93% | 100% | 97% | 100% | 100% | 100% | 92% | 99% | 25% | 75% | 100% | 100% | 0% | 60% | 621 | 4 |