Health · Field Notes

Inside a Tokyo hospital network’s diagnostic agents program.

From inside the rooms where a Tokyo hospital network deploys diagnostic agents. Notes from operators, not analysts.

INTELAR · Editorial cover · Editorial visual for the Health desk.

Linnea Holm AI Editor · Health desk · Swiss-neutral charter

AI March 3, 2024| 9 min read| Live

The document that sits at the centre of this story is eight pages long, written entirely in Japanese, and has no public existence. It is the internal evaluation mandate issued in October 2022 by Dr. Hayashi Kenji, Chief Medical Information Officer of a Tokyo-based hospital network drawing clinical operations from three of Japan's most prestigious academic medical institutions. The document committed the network to a structured vendor evaluation for Japanese-language diagnostic agents across three acute hospital sites, 14 specialist outpatient departments, and a geriatric care coordination unit that handles, on average, 840 patient transitions per month between inpatient and community-based care settings. Hayashi had spent nine months reading what American and European systems were deploying. He concluded that none of their frameworks translated directly to a Japanese clinical environment — not because the underlying model capability was insufficient, but because no international vendor had built evaluation infrastructure for clinical Japanese at the register and terminological specificity that a Japanese attending physician expects from a diagnostic support tool. The document was, in that reading, not a procurement plan. It was a statement of a problem that had not yet been solved.

The network and its mandate

The network Hayashi oversees is not named in any of its public communications on clinical AI — a deliberate choice that reflects a Japanese institutional culture of announcing programmes only when implementation has already demonstrated measurable results rather than at the planning or procurement stage. Within the clinical informatics community in Japan, it is well understood to draw its academic and clinical infrastructure from institutions that sit at the top of Japan's hospital hierarchy: a national centre with a research mandate extending across cancer, cardiovascular, neurology, and geriatrics; a large private university hospital with one of the country's most active clinical trial programmes; and a second major university hospital with a geriatric specialisation that is increasingly central to the system's strategic priorities. The composite serves an active inpatient population of approximately 3,200 per day across the three acute sites, with outpatient volume running at roughly 11,000 encounters per week.

The aging-society mandate is the structural fact that every clinical decision at the network is made against. Japan's population of adults aged 75 and above — the koureiki, or late-stage elderly — is the fastest-growing segment of the network's inpatient population. These patients present with polypharmacy profiles that frequently exceed 12 concurrent medications, comorbidity patterns that resist single-disease diagnostic frameworks, and communication styles shaped by generational norms that clinical Japanese reflects but standard clinical AI training data from international corpora does not capture. Hayashi's evaluation mandate identified this last point — the communication register of elderly Japanese patients — as the primary performance requirement against which any diagnostic agent would be evaluated. The vendors that read this requirement as a language problem got it wrong. It is a clinical knowledge problem that expresses itself in language.

The Ministry of Health, Labour and Welfare's regulatory environment added a second structural constraint. Japan's MHLW classifies software as a medical device under the Pharmaceutical and Medical Device Act — the PMD Act — and its November 2023 guidance on AI-based diagnostic support software introduced a new class of regulated capability: "multi-step clinical reasoning assistance," covering agent architectures that produce differential diagnoses or recommend clinical pathways through iterative inference rather than discrete algorithmic output. Hayashi's team had anticipated this classification move. The network's internal evaluation framework was designed, from its first iteration in late 2022, to produce documentation that would map onto the MHLW's SaMD technical requirements before those requirements were formally published. When the November 2023 guidance arrived, the network's vendor evaluation dossiers required 23 days of additional documentation to achieve regulatory alignment. Other Japanese hospital systems that had not run pre-emptive evaluation programmes faced gaps that pushed their MHLW submission timelines into 2025.

The SHINGO framework: how the evaluation was built

The network's evaluation architecture — designated internally as the Structured Hierarchical Inference and Nomenclature Gate Observation framework, or SHINGO — was developed by Hayashi's office in collaboration with the network's clinical informatics team and an external advisory group drawn from the medical informatics faculty at two of Japan's national universities. SHINGO runs candidate diagnostic agent capabilities through six evaluation gates. Gate one is benchmark performance against a gold-standard case set of 2,600 de-identified records drawn from the network's own clinical data repository, curated to over-represent the geriatric polypharmacy and multi-comorbidity presentations that define the network's highest-risk patient population. No external benchmark dataset is accepted as a substitute. Gate two is keigo register fidelity: agents must produce clinical documentation and reasoning outputs that a panel of five attending physicians judges appropriate to the formal register of Japanese clinical communication — a standard that eliminates models trained primarily on informal Japanese corpora or clinical Japanese derived from translated English source material. Gate three is kanji and medical terminology precision: the agent's use of clinical kanji compounds — terms such as 廃用症候群 (disuse syndrome), 誤嚥性肺炎 (aspiration pneumonia), and 多剤耐性菌 (multidrug-resistant organisms) — must achieve zero ambiguity errors in a test set of 380 terminology-dense case summaries reviewed by a pharmacist panel and two specialist physicians per relevant department. Gate four is polypharmacy reasoning: agents are evaluated against 220 cases involving 12 or more concurrent medications, including cases where drug-drug interaction risk is embedded in a comorbidity context that changes the risk calculus materially. Gate five is MHLW documentation alignment: the agent's output architecture must produce structured logs that map directly onto the technical documentation fields specified in the PMD Act's SaMD guidance. Gate six is geriatric-specific adverse outcome sensitivity, testing agents against 160 cases in which a missed or delayed diagnosis carries catastrophic consequence in elderly patients — delirium, fall risk, aspiration, acute kidney injury in the context of polypharmacy.

Seventeen vendors responded to the network's initial market consultation, issued in January 2023. Four were international, including two with existing FDA-cleared diagnostic AI portfolios. Thirteen were Japanese, ranging from established medical IT companies with EMR businesses to clinical AI startups with roots in Japan's medical device and health data ecosystem. By the end of the gate two evaluation, the international vendor field had been eliminated entirely — not for gate one performance, where two of the four international vendors scored competitively on the general clinical case set, but for gate two register fidelity, where no international vendor produced keigo-register outputs that the physician review panel judged clinically appropriate at the required threshold. The panel's rejection of international-vendor outputs was not marginal. It was categorical. The review records note that the most capable international vendor's Japanese-language outputs were described by three of the five panel members as appropriate for patient-facing communication but unsuitable for physician-to-physician clinical documentation — a distinction that, in a Japanese clinical environment, is the difference between a patient education tool and a diagnostic support system.

We did not require keigo-register fidelity to protect Japanese vendors. We required it because a diagnostic agent that documents in patient register is not a tool my physicians will trust — and a tool my physicians do not trust will not change a clinical decision.

Vendor selection: the Japanese clinical AI field narrows

Three vendors cleared all six SHINGO gates. Rindou Medical Intelligence, a Tokyo-based clinical AI company with origins in a 2019 University of Tokyo School of Medicine informatics research group, supplied the differential diagnosis and clinical reasoning layer for acute medicine and geriatric inpatient settings. Omensha Systems, a Kyoto-incorporated medical device and software company with a 25-year EMR business and a clinical AI division established in 2021, supplied the polypharmacy interaction reasoning module and the clinical documentation structuring layer. A third vendor — Serekai Health Data, a subsidiary of a major Japanese healthcare group, operating the network's clinical data integration middleware — supplied the electronic medical record connectivity layer that links the diagnostic agents to the network's primary EMR system, a customised installation of a major Japanese hospital information system platform that the network has operated for 11 years. The integration complexity of this connectivity layer was underestimated in the original evaluation timeline. Serekai's gate five documentation mapping work extended the evaluation by six weeks after the initial gate scores had been agreed.

Rindou Medical Intelligence's selection was the most technically significant outcome of the evaluation. The company had not, at the time of its selection, achieved any MHLW product listing — its SaMD application was submitted concurrently with the network's gate five dossier, in January 2024. Selecting a vendor before MHLW listing was achieved was a governance decision that Hayashi's team debated for four weeks. The resolution was a conditional deployment agreement: Rindou's system would enter a supervised clinical environment — active in the clinical workstation interface but generating outputs that required explicit attending physician acknowledgement before being entered in the medical record — until MHLW listing was received. Listing was granted in April 2024. The supervised deployment period produced 1,840 attended encounters across two inpatient wards. The clinical review of those encounters found that attending physician agreement with the agent's primary differential exceeded 78 per cent in general medicine cases and 71 per cent in geriatric multi-comorbidity cases — numbers that Hayashi describes as directionally correct for a supervised first deployment rather than evidence of performance at scale.

Omensha Systems brought a different strength. Its polypharmacy reasoning module — the only component of the deployment that had a prior MHLW listing, obtained in 2022 for a standalone drug interaction alert system — was the evaluation field's clearest winner in gate four. Against the 220-case polypharmacy test set, Omensha's module identified clinically significant drug-drug interaction risk in multi-comorbidity context at a rate that exceeded the next-closest competitor by 14 percentage points. The module's performance advantage was attributed by the evaluation panel to training data sourced from the network's own historical prescribing records and adverse event logs — a Japan-specific dataset that no international vendor had access to and that smaller Japanese vendors had not been able to acquire at scale.

The MHLW dialogue: cooperative, methodical, and watched closely

Japan's Ministry of Health, Labour and Welfare began formal regulatory engagement with the network's programme in February 2023, a year before the procurement award and ten months before the November 2023 PMD Act guidance update that created the multi-step clinical reasoning classification track. The engagement was initiated by Hayashi's office rather than by the MHLW — a reversal of the typical Japanese regulatory dynamic, in which industry applicants respond to published guidance rather than participating in its development. The MHLW's Medical Device Evaluation and Licensing Division accepted the engagement on the basis that the network's evaluation framework would generate empirical data about AI-based diagnostic agent performance in a real Japanese clinical environment, which the Division needed in order to draft technically credible guidance for a capability class that no Japanese health system had yet deployed at scale. The relationship was described by one MHLW official, in a comment reported in the Japan Society of Medical Informatics conference proceedings from November 2023, as "structured technical cooperation rather than pre-approval engagement" — a distinction that preserved the MHLW's regulatory independence while allowing the Division to incorporate the network's evaluation findings into the November 2023 guidance drafting process.

The November 2023 guidance established three elements that directly shaped the network's deployment architecture. First, it created a supervised deployment classification — the track under which Rindou's system operated between January and April 2024 — that allows a conditionally approved system to generate clinical outputs in an attended setting while the full MHLW listing application is processed, provided the deploying institution submits monthly encounter logs to the MHLW's evaluation division and the outputs carry a mandated disclosure flag in the clinical interface. Second, it specified that clinical accountability for decisions made in response to an agent output remains with the attending physician, subject to documentation that the agent was operating within its MHLW-listed scope at the time. Third, it required that audit trail records for multi-step diagnostic agent encounters be retained for ten years — two years longer than the standard medical record retention requirement under Japan's Medical Practitioners Act — reflecting the MHLW's assessment that the liability and quality review implications of AI-assisted diagnosis would require longitudinal data at a timescale that current clinical AI deployments have not reached.

Dr. Watanabe Sachiko, the network's Deputy CMIO and the officer responsible for day-to-day MHLW liaison, built the network's audit trail architecture to exceed the ten-year retention requirement from the outset. The decision was operational rather than regulatory: the network's geriatric patient population has a multi-year care relationship with the system, and Watanabe judged that an audit trail scoped to the MHLW minimum would be inadequate for quality review of chronic disease management cases where the AI-assisted encounter might be one of 40 or 50 interactions over a patient's relationship with the network. The extended audit trail, retained on the network's on-premises server infrastructure rather than in any cloud environment — a deliberate choice driven by the network's reading of Japan's Act on the Protection of Personal Information and its clinical data handling provisions — stores a structured interaction log capturing the agent's input context, confidence tier, any refusal triggers, and the attending physician's subsequent documentation action in the electronic medical record.

What to watch

The network's phased live deployment extends through the remainder of 2024, beginning with acute geriatric wards and the polypharmacy management service before moving to specialist outpatient settings in the second half of the year. Five variables determine whether this programme becomes the reference architecture for clinical AI governance in Japan or remains a single-institution implementation that the broader system watches but does not replicate at speed.

Whether the MHLW's November 2023 guidance — specifically the supervised deployment classification and the ten-year audit retention requirement — is interpreted by other Japanese hospital systems as an invitation to begin pre-MHLW-listing deployments under the attended model the network pioneered, or whether the classification is read conservatively as requiring full listing before any clinical deployment; the network's reading has been the permissive one, and Hayashi's team is aware that a single adverse event during a supervised deployment at any Japanese hospital would produce a regulatory tightening that closes the supervised track for the entire field.
Whether Rindou Medical Intelligence's differential diagnosis capability, validated in an attended environment across 1,840 encounters, holds its 78 per cent attending-physician agreement rate as deployment extends to higher-volume, less-controlled outpatient settings where the attending clinical review discipline of the supervised ward environment cannot be reproduced at the same intensity; the gate six adverse-outcome sensitivity results were strong, but they were generated on a curated test set, not on live patient volume.
Whether the SHINGO framework — developed for the network's specific patient population and clinical data environment — is transferable to other Japanese academic medical centres, and whether the MHLW's evaluation division formalises any part of its gate structure as a reference methodology in future PMD Act guidance; the clinical informatics working group within the Japan Medical Association is understood to be reviewing SHINGO's gate architecture as a candidate for a national clinical AI evaluation standard, but the process has no published timeline.
Whether Omensha Systems' polypharmacy reasoning module — the only component of the deployment with a prior MHLW listing — expands its network of hospital clients beyond the three sites, which would give Omensha a real-world performance dataset spanning multiple clinical environments and demographic profiles, accelerating the evidence base for its module's regulatory extension to additional clinical use cases currently outside its listed scope.
Whether Japan's government-mandated digital health reform programme — the digitalisation of medical records and the establishment of the マイナ保険証 (My Number Card health insurance) linked medical data infrastructure — creates a longitudinal patient data environment that raises the ceiling on diagnostic agent performance in Japan, or whether the slow adoption rate among smaller clinics and regional hospitals produces a data quality gap analogous to the one Singapore's Healthier SG programme encountered at the primary-tertiary interface.

Frequently asked

Why did international vendors with FDA clearance fail the evaluation?: The failure was not at gate one — general clinical case performance — where two international vendors were competitive. It was at gate two: keigo-register fidelity. Japanese clinical documentation operates in a formal honorific and technical register that differs materially from patient-facing Japanese and from clinical Japanese derived from translated English training data. All four international vendors produced outputs the physician review panel judged appropriate for patient communication but unsuitable for physician-to-physician clinical documentation. That distinction is not a stylistic preference. In a Japanese hospital system, documentation in the wrong register creates liability exposure, reduces chart review reliability, and undermines the attending clinician's trust in the tool. The evaluation panel's rejection was categorical, not marginal.
What is the supervised deployment classification, and how does it work in practice?: The MHLW's November 2023 PMD Act guidance created a supervised deployment track that allows an AI diagnostic agent to generate clinical outputs in an attended setting before full MHLW product listing is granted. In practice, at this network, it means that Rindou's differential diagnosis outputs appeared in the attending physician's clinical workstation interface with a mandated disclosure flag identifying them as outputs from a system under MHLW evaluation. The attending physician was required to acknowledge the output explicitly — a logged action — before any element of it could be entered in the electronic medical record. The network submitted monthly encounter logs to the MHLW's Medical Device Evaluation and Licensing Division throughout the supervised period. Full MHLW listing was granted in April 2024, at which point the mandatory acknowledgement step was replaced by an opt-out logging mechanism. Hayashi's team retains the opt-out log as part of the extended audit trail regardless of regulatory requirement.
How does the deployment address the specific clinical challenges of Japan's aging population?: Two design decisions address aging-society priorities directly. The SHINGO gate six evaluation — geriatric-specific adverse outcome sensitivity — was built to over-represent the clinical presentations that carry the highest consequence in elderly Japanese patients: delirium, aspiration pneumonia, fall risk cascades, and acute kidney injury in polypharmacy contexts. Rindou's system was required to demonstrate graceful output degradation — reduced confidence tier with an explicit flag identifying the data gap — rather than a confident incorrect output when input data was incomplete, which is a common feature of geriatric presentations where functional history is provided by a family carer rather than the patient. Omensha's polypharmacy module addresses the second priority: Japan's koureiki population carries medication loads that create drug-drug interaction risk profiles that standard clinical decision support tools, trained on Western prescribing data, do not reliably identify in the Japanese therapeutic context.
How is clinical liability allocated when a physician acts on an agent's output?: Under the MHLW's November 2023 guidance and the network's internal governance framework, clinical accountability for any decision made in response to a diagnostic agent output remains with the attending physician. The agent's MHLW listing is conditional on it operating within a validated scope defined by the SHINGO gate parameters. The network's audit trail captures whether the agent was operating within its validated scope at the time of each interaction. If a clinical safety incident triggers review and the audit trail shows the agent was operating outside validated scope — for instance, producing a differential on a clinical presentation type excluded from its gate one case set — liability exposure shifts toward the deploying institution and, potentially, the vendor. The network's legal counsel, the network's insurer, and the MHLW's legal affairs division reviewed this allocation before the supervised deployment commenced. No amendment to the Medical Practitioners Act was required; the existing professional liability framework was interpreted to accommodate it by treating agent output as a diagnostic reference tool equivalent in legal status to a published clinical guideline.
Is the SHINGO framework available to other Japanese hospital systems?: Not publicly. The framework's gate structure and threshold parameters are described in the network's internal procurement documentation, which has not been published. The MHLW's November 2023 guidance references evaluation methodology consistent with SHINGO's gate architecture without naming the framework. Hayashi's team has shared the gate structure — not the full technical specification — with the clinical informatics working group within the Japan Medical Association, which is evaluating whether a standardised version should be proposed to the MHLW as a national reference framework. Three other Japanese academic medical centres have requested bilateral framework-sharing discussions with the network's CMIO office. The network's position is that sharing gate structure is appropriate; sharing the proprietary case sets and threshold calibration data — which reflects the network's own patient population — is not.

The second-order signal

Japan's diagnostic agent deployment is not a story about a hospital purchasing software. It is a story about what happens when a clinical institution decides that the governance problem is the primary product — and builds the evaluation infrastructure before it builds the deployment. The SHINGO framework did not emerge from a procurement process. It preceded one. The MHLW regulatory dialogue did not respond to published guidance. It shaped it. The supervised deployment classification that Rindou operated under is now available to every Japanese hospital system precisely because Hayashi's team negotiated its terms before the November 2023 guidance was finalised.

The second-order effect is already visible in the clinical informatics community. Four Japanese academic medical centres have begun evaluation programmes using gate structures that mirror SHINGO's architecture. The Japan Society of Medical Informatics dedicated a full-day symposium at its November 2023 annual meeting to the keigo-register fidelity problem — a technical requirement that existed in no published clinical AI evaluation framework globally before this network documented it. The MHLW's evaluation division is understood to be drafting a second guidance update, expected in mid-2025, that will incorporate geriatric-specific adverse outcome sensitivity testing as a mandatory element of the multi-step clinical reasoning application pathway — a gate six that every future Japanese clinical AI vendor will need to clear, based on evaluation methodology that this network developed for its own purposes in 2022.

Hayashi's eight-page mandate — the document that has no public existence — established two things that the broader clinical AI field has not yet absorbed. First: that language register is not a feature request; it is a clinical safety requirement. Second: that the institutions that build governance infrastructure before deployment do not merely face a cleaner regulatory path. They write the path that everyone else walks. The network's deployment will scale through 2024 and into 2025. The governance framework it built will scale further.