Health · Briefing

Mayo Clinic deploys diagnostic agents across 340 hospitals.

The story is not “AI in medicine.” It’s evaluation. Liability. Workflow. And the fact that the highest-performing agent teams are starting to look like quality-assurance departments.

INTELAR · Field photography · Clinical AI succeeds only where evaluation, workflow, and accountability meet.

Inès Marchetti Senior Correspondent · Wealth & Health systems

15 May 2026| 7 min read

The TL;DR

Mayo’s deployment is less about model choice and more about operational containment: where agents can act, and when they must escalate.
The critical innovation is a clinical eval loop: counterfactual testing and “red team” chart review before anything touches patient care.
Hospitals are converging on the same design: agents propose, humans decide, systems log everything.
If this pattern holds, the winners in clinical AI will be the institutions with the best quality systems — not the flashiest demos.

What happened.

Mayo Clinic’s rollout is best understood as an internal operating change: clinical agents are now available inside the hospital workflow, but only in constrained modes. They draft. They summarize. They propose differentials. They request missing information. They do not order tests. They do not prescribe. They do not “decide.”

That constraint is not weakness. It is the maturity signal. In clinical environments, capability without governance is malpractice risk.

Evaluation is the actual product.

The operational bottleneck is not “AI accuracy.” It is defining what accuracy means in context. Mayo’s pattern — the one we now see repeated across multiple large systems — is:

Gold sets of de-identified cases (including rare conditions).
Counterfactual prompts (“what if symptom X were absent?”) to test brittleness.
Clinician review as a standing committee, not an ad-hoc task.
Monitoring for drift, not just initial validation.

The model is the easy part. The hard part is proving to ourselves — and to regulators — that it behaves inside the boundary we think we drew. — Director of clinical informatics (on background)

Governance, liability, and why agents stay advisory.

Every hospital we’ve spoken to converges on the same liability logic: if an agent can take an irreversible action, it inherits the institution’s risk surface. That includes: ordering, prescribing, discharging, and billing codes.

So the current equilibrium is advisory agents with very strong audit trails. The winning design question is not “can the agent do it,” but “can we reconstruct what it did, why, and under whose authority.”

Frequently asked.

No. The dominant deployment pattern is advisory: agents propose summaries, differentials, and next questions; clinicians make decisions. The work shifts toward review and judgment, not away from it.

Governance. A real deployment has constrained permissions, explicit escalation paths, audit logging, and an evaluation loop that is owned as an ongoing program — not a one-time validation.

Two signals: whether agents are allowed to order tests (a liability step-function), and whether systems publish standardized eval reports that make comparisons possible across hospitals.

Inès Marchetti

Senior Correspondent · Wealth & Health systems

Inès covers institutional decision-making: family offices, health systems, and the operators building private infrastructure under public narratives. Previously at the Financial Times. Based in Geneva.

208 articlesCited in FT, Bloomberg

What happened.

Evaluation is the actual product.

Governance, liability, and why agents stay advisory.

Frequently asked.

Inès Marchetti

The new luxury is cognitive ease — and Patek hires a Chief Intelligence Officer.

How Anthropic’s Skills primitive is eating orchestration.

Inside the family offices quietly building private LLMs.

Health systems, without the hype.