Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
rail [2023/12/05 16:33] nigam |
rail [2024/05/12 10:55] (current) nigam |
====== Responsible AI in Healthcare ====== | ====== Responsible AI in Healthcare ====== |
| |
In healthcare, "Standard AI" models estimate the risk of having some underlying condition or developing it in the future. Whether a model is usefulness depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. We study this interplay for bringing AI to the clinic, safely, cost-effectively and ethically and to inform the work of the [[https://dsatshc.stanford.edu/ | Data Science Team at Stanford Healthcare]] | Our team is focused on bringing AI into clinical use, safely, ethically and cost effectively. Our work is organized in two broad work-streams. |
| |
{{ :model-interplay.png?400&nolink& }} | ===== Creation and adoption of foundation models in medicine ===== |
| |
[[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner. Our research stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://jamanetwork.com/journals/jama/fullarticle/2748179?guestAccessKey=8cef0271-616d-4e8e-852a-0fddaa0e5101|Ensuring that machine learning models are clinically useful]] requires [[https://www.nature.com/articles/s41591-019-0651-8| estimating the hidden deployment cost of predictive models]] as well as quantifying the [[http://academic.oup.com/jamia/article/28/6/1149/6045012|impact of work capacity constraints]] on achievable benefit, estimating [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learning [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. Pre-empting [[https://www.nejm.org/doi/full/10.1056/NEJMp1714229|ethical challenges]] often requires keeping [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop|humans in the loop]] and focus on examining the [[https://informatics.bmj.com/content/29/1/e100460|consequences of model-guided decision making]] in the presence of clinical care guidelines. | Given the high interest in using large language models (LLMs) in medicine, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296|creation and use of LLMs in medicine]] needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments. |
---- | |
| |
Recently, there is high interest in using large language models (LLMs) in medicine. However, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296 | creation and use of LLMs in medicine]] need to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments. | {{ :verify-benefits.png?nolink&400 }} |
| |
{{ :verify-benefits.png?400&nolink& }} | We study whether commercial language models [[https://arxiv.org/abs/2304.13714|support real-world needs]] or can follow [[https://medalign.stanford.edu/|medical instructions (MedAlign)]] that clinicians would expect them to follow. We build clinical foundation models such as [[https://www.sciencedirect.com/science/article/pii/S1532046420302653| CLMBR]], [[https://arxiv.org/abs/2301.03150| MOTOR]] and verify their benefits such as [[https://www.nature.com/articles/s41598-023-30820-8| robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/| populations]] and [[https://arxiv.org/abs/2311.11483| sites]]. we release de-identified datasets such as [[https://ehrshot.stanford.edu/| EHRSHOT]] for few-shot evaluation of foundation models and multi-modal datasets such as [[https://inspect.stanford.edu/| INSPECT]]. |
| |
We build clinical foundation models ([[https://www.sciencedirect.com/science/article/pii/S1532046420302653 | CLMBR]], [[https://arxiv.org/abs/2301.03150 | MOTOR]]) and verify benefits such as [[https://www.nature.com/articles/s41598-023-30820-8 | robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/ | populations]] and [[https://arxiv.org/abs/2311.11483 | sites]]. In addition we make available de-identified datasets ([[https://ehrshot.stanford.edu/ | EHRSHOT]]) for few-shot evaluation of foundation models as well as for benchmarking instruction following by commercial LLMs ([[https://medalign.stanford.edu/ | MedAlign]]). | |
| |
We also conduct research to assess whether commercial language models support real-world needs. | ===== Making machine learning models clinically useful ===== |
| |
| Whether a classifier or prediction [[ https://jamanetwork.com/journals/jama/article-abstract/2748179 | model is useful]] in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. Our work stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner. |
| |
| {{ :model-interplay.png?400&nolink& }} |
| |
| We study how to quantify the [[https://www.sciencedirect.com/science/article/pii/S1532046423000400|impact of work capacity constraints]] on achievable benefit, estimate [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learn [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. We question conventional wisdom on whether models [[https://tinyurl.com/donot-explain | need to be explainable]], and [[https://www.nature.com/articles/s41591-023-02540-z |generalizable]]. We examine if consequences of using [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop | algorithm guided care are fair]] and how to [[https://hai.stanford.edu/news/how-do-we-ensure-healthcare-ai-useful | ensure that healthcare models are useful]]. We study this interplay to guide the work of the [[https://dsatshc.stanford.edu/ | Data Science Team at Stanford Healthcare]]. |
| |
---- | |
| |
{{youtube>GNTIoEADfY4?small | Artificial Intelligence transforms health care}} | |
| |
Russ Altman and Nigam Shah taking an in-depth look at the growing influence of “data-driven medicine.” | |
| |