Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
rail [2023/12/05 16:50] nigam [Creation and Adoption of Foundation Models in Medicine] |
rail [2024/05/12 10:55] (current) nigam |
====== Responsible AI in Healthcare ====== | ====== Responsible AI in Healthcare ====== |
| |
| Our team is focused on bringing AI into clinical use, safely, ethically and cost effectively. Our work is organized in two broad work-streams. |
| |
===== Making Machine Learning Models Clinically Useful ===== | ===== Creation and adoption of foundation models in medicine ===== |
| |
Whether a classifier or prediction [[ https://jamanetwork.com/journals/jama/article-abstract/2748179 | model is usefulness]] in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. | Given the high interest in using large language models (LLMs) in medicine, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296|creation and use of LLMs in medicine]] needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments. |
| |
{{ :model-interplay.png?400&nolink& }} | {{ :verify-benefits.png?nolink&400 }} |
| |
We study this interplay for bringing AI to the clinic, safely, cost-effectively and ethically and to inform the work of the [[https://dsatshc.stanford.edu/ | Data Science Team at Stanford Healthcare]] in performing assessments to ensure that we are creating Fair, Useful, Reliable Models (FURM). [[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner. Our research stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://jamanetwork.com/journals/jama/fullarticle/2748179?guestAccessKey=8cef0271-616d-4e8e-852a-0fddaa0e5101|Ensuring that machine learning models are clinically useful]] requires [[https://www.nature.com/articles/s41591-019-0651-8| estimating the hidden deployment cost of predictive models]] as well as quantifying the [[http://academic.oup.com/jamia/article/28/6/1149/6045012|impact of work capacity constraints]] on achievable benefit, estimating [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learning [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. Pre-empting [[https://www.nejm.org/doi/full/10.1056/NEJMp1714229|ethical challenges]] often requires keeping [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop|humans in the loop]] and focus on examining the [[https://informatics.bmj.com/content/29/1/e100460|consequences of model-guided decision making]] in the presence of clinical care guidelines. | We study whether commercial language models [[https://arxiv.org/abs/2304.13714|support real-world needs]] or can follow [[https://medalign.stanford.edu/|medical instructions (MedAlign)]] that clinicians would expect them to follow. We build clinical foundation models such as [[https://www.sciencedirect.com/science/article/pii/S1532046420302653| CLMBR]], [[https://arxiv.org/abs/2301.03150| MOTOR]] and verify their benefits such as [[https://www.nature.com/articles/s41598-023-30820-8| robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/| populations]] and [[https://arxiv.org/abs/2311.11483| sites]]. we release de-identified datasets such as [[https://ehrshot.stanford.edu/| EHRSHOT]] for few-shot evaluation of foundation models and multi-modal datasets such as [[https://inspect.stanford.edu/| INSPECT]]. |
| |
| |
===== Creation and Adoption of Foundation Models in Medicine ===== | ===== Making machine learning models clinically useful ===== |
| |
| Whether a classifier or prediction [[ https://jamanetwork.com/journals/jama/article-abstract/2748179 | model is useful]] in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. Our work stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner. |
| |
| {{ :model-interplay.png?400&nolink& }} |
| |
| We study how to quantify the [[https://www.sciencedirect.com/science/article/pii/S1532046423000400|impact of work capacity constraints]] on achievable benefit, estimate [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learn [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. We question conventional wisdom on whether models [[https://tinyurl.com/donot-explain | need to be explainable]], and [[https://www.nature.com/articles/s41591-023-02540-z |generalizable]]. We examine if consequences of using [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop | algorithm guided care are fair]] and how to [[https://hai.stanford.edu/news/how-do-we-ensure-healthcare-ai-useful | ensure that healthcare models are useful]]. We study this interplay to guide the work of the [[https://dsatshc.stanford.edu/ | Data Science Team at Stanford Healthcare]]. |
| |
Given the high interest in using large language models (LLMs) in medicine, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296 | creation and use of LLMs in medicine]] needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments. | |
| |
{{ :verify-benefits.png?400&nolink& }} | |
| |
We conduct research to assess whether commercial language models [[https://arxiv.org/abs/2304.13714 | support real-world needs]] or are able to follow [[https://medalign.stanford.edu/ |medical instructions]] that clinicians would expect them to follow. We build clinical foundation models such as [[https://www.sciencedirect.com/science/article/pii/S1532046420302653 | CLMBR]], [[https://arxiv.org/abs/2301.03150 | MOTOR]] and verify their benefits such as [[https://www.nature.com/articles/s41598-023-30820-8 | robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/ | populations]] and [[https://arxiv.org/abs/2311.11483 | sites]]. In addition we make available de-identified datasets such as [[https://ehrshot.stanford.edu/ | EHRSHOT]] for few-shot evaluation of foundation models. | |
| |