Our team is focused on bringing AI into clinical use, safely, ethically and cost effectively. Our work is organized in two broad work-streams.
Given the high interest in using large language models (LLMs) in medicine, the creation and use of LLMs in medicine needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.
We study whether commercial language models support real-world needs or can follow medical instructions (MedAlign) that clinicians would expect them to follow. We build clinical foundation models such as CLMBR, MOTOR and verify their benefits such as robustness over time, populations and sites. we release de-identified datasets such as EHRSHOT for few-shot evaluation of foundation models and multi-modal datasets such as INSPECT.
Whether a classifier or prediction model is useful in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. Our work stemmed from the effort in improving palliative care using machine learning. Blog posts at HAI summarize our work in easily accessible manner.
We study how to quantify the impact of work capacity constraints on achievable benefit, estimate individualized utility, and learn optimal decision thresholds. We question conventional wisdom on whether models need to be explainable, and generalizable. We examine if consequences of using algorithm guided care are fair and how to ensure that healthcare models are useful. We study this interplay to guide the work of the Data Science Team at Stanford Healthcare.