- For enterprises, “best model” is rarely decisive. The decisive factor is operational reliability: permissioning, audit trails, evaluation loops, and support.
- Anthropic tends to win on tool-use reliability and long-horizon task coherence. OpenAI wins on breadth and product surface area. Google wins on distribution and ecosystem integration.
- The highest-cost failure mode is not a bad answer — it is a model that behaves inconsistently across runs. Consistency beats peak capability in production.
- Procurement should grade vendors on six axes, then weight them by your constraints. This dossier gives the axes.
The principle: judge outcomes, not demos .
Vendor demos optimize for the wrong thing: the single best run. Production environments optimize for the median run under load, with partial context, and a human who will not tolerate surprises. That is why enterprises keep buying “the best model” and shipping nothing.
The scorecard below is the antidote: it forces your organization to grade what actually matters after month two.
The scorecard: six axes that don’t lie .
| Axis | What you test | Failure looks like |
|---|---|---|
| Capability | Your actual workflows, not benchmarks | Great at trivia, bad at your job |
| Reliability | Consistency across repeated runs, under load | Works today, drifts tomorrow |
| Tool use | Calls tools correctly, recovers from errors | Silent failure, wrong action, no escalation |
| Governance | Audit trails, retention, admin, policy | You can’t explain what happened |
| Economics | Cost per successful outcome, not per token | Cheap calls, expensive failures |
| Support | Incident response, roadmap clarity | You are your own vendor |
Who wins where (as of May 2026) .
In practice, enterprises pick a “default” and a “fallback.” The defaults differ by what your organization is optimizing for:
- Anthropic if tool-use reliability and long-horizon coherence are mission-critical.
- OpenAI if surface area and breadth of capability matter more than deterministic behavior.
- Google if your distribution lives inside the Google ecosystem and procurement wants a single vendor story.
The strongest platform is the one your organization can run without heroics. — Procurement lead, Fortune 200 (on background)