Our team is focused on bringing AI into clinical use, safely, ethically and cost effectively. Our work is organized in two broad work-streams.
Given the high interest in using large language models (LLMs) in medicine, the creation and use of LLMs in medicine needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments.
We study whether commercial language models support real-world needs or are able to follow medical instructions that clinicians would expect them to follow. We build clinical foundation models such as CLMBR, MOTOR and verify their benefits such as robustness over time, populations and sites. In addition we make available de-identified datasets such as EHRSHOT for few-shot evaluation of foundation models and are working to release multi-modal datasets such as INSPECT.
Whether a classifier or prediction model is usefulness in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms.
We study this interplay to inform the work of the Data Science Team at Stanford Healthcare. Our work stemmed from the effort in improving palliative care using machine learning. Blog posts at HAI summarize our work in easily accessible manner. Ensuring that machine learning models are clinically useful requires quantifying the impact of work capacity constraints on achievable benefit, estimating individualized utility, and learning optimal decision thresholds.