User Tools

Site Tools


rail

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
rail [2023/12/05 16:46]
nigam
rail [2024/05/12 10:55] (current)
nigam
Line 1: Line 1:
 ====== Responsible AI in Healthcare ====== ====== Responsible AI in Healthcare ======
  
 +Our team is focused on bringing AI into clinical use, safely, ethically and cost effectively. Our work is organized in two broad work-streams.
  
-===== Making Machine Learning Models Clinically Useful =====+===== Creation and adoption of foundation models in medicine =====
  
-Whether a classifier or prediction model is usefulness in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms+Given the high interest in using large language models (LLMs) in medicine, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296|creation and use of LLMs in medicine]] needs to be actively shaped by provisioning relevant training dataspecifying the desired benefits, and evaluating the benefits via testing in real-world deployments.
  
-{{  :model-interplay.png?400&nolink&  }}+{{  :verify-benefits.png?nolink&400  }}
  
-We study this interplay for bringing AI to the clinic, safely, cost-effectively and ethically and to inform the work of the [[https://dsatshc.stanford.edu| Data Science Team at Stanford Healthcare]] in performing assessments to ensure that we are creating Fair, Useful, Reliable Models (FURM). [[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner. Our research stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://jamanetwork.com/journals/jama/fullarticle/2748179?guestAccessKey=8cef0271-616d-4e8e-852a-0fddaa0e5101|Ensuring that machine learning models are clinically useful]] requires [[https://www.nature.com/articles/s41591-019-0651-8estimating the hidden deployment cost of predictive models]] as well as quantifying the [[http://academic.oup.com/jamia/article/28/6/1149/6045012|impact of work capacity constraints]] on achievable benefit, estimating [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learning [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. Pre-empting [[https://www.nejm.org/doi/full/10.1056/NEJMp1714229|ethical challenges]] often requires keeping [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop|humans in the loop]] and focus on examining the [[https://informatics.bmj.com/content/29/1/e100460|consequences of model-guided decision making]] in the presence of clinical care guidelines.+We study whether commercial language models [[https://arxiv.org/abs/2304.13714|support real-world needs]] or can follow [[https://medalign.stanford.edu/|medical instructions (MedAlign)]] that clinicians would expect them to followWe build clinical foundation models such as [[https://www.sciencedirect.com/science/article/pii/S1532046420302653CLMBR]][[https://arxiv.org/abs/2301.03150MOTOR]] and verify their benefits such as [[https://www.nature.com/articles/s41598-023-30820-8robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/| populations]] and [[https://arxiv.org/abs/2311.11483sites]]. we release de-identified datasets such as [[https://ehrshot.stanford.edu/EHRSHOT]] for few-shot evaluation of foundation models and multi-modal datasets such as [[https://inspect.stanford.edu/| INSPECT]].
  
  
-===== Creation and Adoption of Foundation Models in Medicine =====+===== Making machine learning models clinically useful ===== 
 + 
 +Whether a classifier or prediction [[ https://jamanetwork.com/journals/jama/article-abstract/2748179 | model is useful]] in guiding care depends on the interplay between the model's output, the intervention it triggers, and the intervention’s benefits and harms. Our work stemmed from the effort [[http://stanmed.stanford.edu/2018summer/artificial-intelligence-puts-humanity-health-care.html|in improving palliative care]] using machine learning. [[https://www.tinyurl.com/hai-blogs | Blog posts at HAI]] summarize our work in easily accessible manner.  
 + 
 +{{  :model-interplay.png?400&nolink&  }}
  
 +We study how to quantify the [[https://www.sciencedirect.com/science/article/pii/S1532046423000400|impact of work capacity constraints]] on achievable benefit, estimate [[https://www.sciencedirect.com/science/article/pii/S1532046421001544|individualized utility]], and learn [[https://pubmed.ncbi.nlm.nih.gov/34350942/|optimal decision thresholds]]. We question conventional wisdom on whether models [[https://tinyurl.com/donot-explain | need to be explainable]], and [[https://www.nature.com/articles/s41591-023-02540-z |generalizable]]. We examine if consequences of using [[https://hai.stanford.edu/news/when-algorithmic-fairness-fixes-fail-case-keeping-humans-loop | algorithm guided care are fair]] and how to [[https://hai.stanford.edu/news/how-do-we-ensure-healthcare-ai-useful | ensure that healthcare models are useful]]. We study this interplay to guide the work of the [[https://dsatshc.stanford.edu/ | Data Science Team at Stanford Healthcare]]. 
  
-Given the high interest in using large language models (LLMs) in medicine, the [[https://jamanetwork.com/journals/jama/fullarticle/2808296 | creation and use of LLMs in medicine]] needs to be actively shaped by provisioning relevant training data, specifying the desired benefits, and evaluating the benefits via testing in real-world deployments. 
  
-{{  :verify-benefits.png?400&nolink&  }} 
  
-We build clinical foundation models such as [[https://www.sciencedirect.com/science/article/pii/S1532046420302653 | CLMBR]], [[https://arxiv.org/abs/2301.03150 | MOTOR]] and verify benefits such as [[https://www.nature.com/articles/s41598-023-30820-8 | robustness over time]], [[https://pubmed.ncbi.nlm.nih.gov/37639620/ | populations]] and [[https://arxiv.org/abs/2311.11483 | sites]]. In addition we make available de-identified datasets such as [[https://ehrshot.stanford.edu/ | EHRSHOT]] for few-shot evaluation of foundation models as well as for benchmarking instruction following by commercial LLMs ([[https://medalign.stanford.edu/ | MedAlign]]). We also conduct research to assess whether commercial language models [[https://arxiv.org/abs/2304.13714  | support real-world needs]]. 
  
rail.1701823602.txt.gz · Last modified: 2023/12/05 16:46 by nigam