This is an old revision of the document!

Large language models in Healthcare

The past year has seen significant advancements in artificial intelligence (AI) for various modalities, such as text, image, and video. Foundation models, which are AI models trained on large, unlabeled datasets and highly adaptable to new applications, are driving these innovations. These new class of models offer opportunities for a better paradigm of doing “AI in healthcare” by providing adaptability with fewer manually labeled examples, modular and robust AI, multimodality, and new interfaces for human-AI collaboration. Read about How Foundation Models Can Advance AI in Healthcare

Although foundation models (FMs), including large language models (LLMs), have immense potential in healthcare, evaluating their usefulness, fairness, and reliability is challenging, as they lack shared evaluation frameworks and datasets. Over 80 clinical FMs have been reviewed, but their evaluation regimes do not indicate their clinical value accurately. Until their factual correctness and robustness are ensured, it is difficult to justify the use of FMs in clinical practice. Read about The Shaky Foundations of Foundation Models in Healthcare

We examined the safety and accuracy of GPT-4 in serving curbside consultation needs of doctors. Read about How Well Do Large Language Models Support Clinician Information Needs? and check out the arxiv submission at https://arxiv.org/abs/2304.13714

We also evaluated the ability of GPT-4 to generate realistic USMLE Step 2 exam questions by asking licensed physicians to distinguish between AI-generated and human-generated questions and to assess their validity. The results indicate that GPT-4 can create questions that are largely indistinguishable from human-generated ones, with a majority of the questions deemed “valid”. Read more at https://doi.org/10.1101/2023.04.25.23288588

Shah Lab

User Tools

Site Tools

Large language models in Healthcare

Page Tools