====== CLMBR (clinical language modeling based representations) ======
 

This is a 141 million parameter autoregressive foundation model pretrained on 2.57 million deidentified EHRs from Stanford Medicine. This is the model from ([[https://arxiv.org/abs/2307.02028 | Wornow et al. 2023]]), and is based on the CLMBR architecture originally described in ([[https://www.sciencedirect.com/science/article/pii/S1532046420302653 | Steinberg et al. 2021]])

As input, this model expects a sequence of coded medical events that have been mapped to Standard Concepts within the OMOP-CDM vocabulary. The model generates representations of patients which can then be used for downstream prediction tasks.

For details see -- https://huggingface.co/StanfordShahLab/clmbr-t-base