Back to Blog

Waymark study on auditable clinical AI selected as publication feature

Waymark

June 24, 2026

Back to Blog

Waymark study on auditable clinical AI selected as publication feature

Waymark

June 28, 2026

The editorial team at BMJ Health and Care Informatics has selected a study from Waymark's data science team as one of four featured articles on the journal homepage for January through March 2026. Each quarter, the editors choose four papers from the prior three months for this recognition.

The paper, "Mechanistic interpretability of reinforcement learning in Medicaid care coordination," was authored by Sanjay Basu, Sadiq Patel, Parth Sheth, Bhairavi Muralidharan, Namrata Elamaran, Aakriti Kinra, and Rajaie Batniji.

The study addresses one of the central challenges in deploying AI for healthcare. Clinical teams need to understand how an algorithm reaches its recommendations before they can rely on it appropriately. Waymark's care model uses a reinforcement learning system to rank intervention options for rising-risk Medicaid beneficiaries, and care teams review those recommendations and retain final decision authority. The research team audited 250,000 of these decisions, covering 45,000 beneficiaries across Washington, Virginia, and Ohio between July 2023 and June 2025, and built methods to expose the reasoning behind each recommendation.

The analysis revealed seven clinically coherent patterns the system had learned, each connecting a social condition to medical risk. Housing instability linked to respiratory exacerbations, food insecurity to glycemic control challenges, and transportation barriers to missed specialty appointments. These patterns demonstrated that the system had learned clinically meaningful connections grounded in real care dynamics.

The analysis also found that a calibrated safety check cleared roughly 90 percent of decisions as low risk and flagged the remainder for mandatory human review. Residual harm among cleared decisions was 1.2 percent, compared with 6.7 percent among flagged decisions, confirming that the check concentrated risk where human judgment was most needed.

The team disaggregated performance across demographic groups and applied constrained adjustments that reduced disparities in harm by 28 to 37 percent across race and sex groups, with the system's overall performance preserved.

The team also developed a systematic catalog of where the algorithm's recommendations fell short. The most common shortfall traced to information gaps, cases where the system lacked context that a care team member had on the ground. That finding points toward concrete data infrastructure improvements as the highest-value next step.

Taken together, the methods offer a governance scaffold for clinical AI: transparent reasoning traces, calibrated safety envelopes, and quantified fairness metrics that health systems and regulators can adopt.

Congratulations to the team on this recognition!

Read the full paper.