This is an old revision of the document!

Open RA positions.

Position 1: Machine Learning Engineer for point of care model deployments

Description
We are looking for a Machine Learning Engineer / Data Scientist to work on exciting challenges at the intersection of Machine Learning and Healthcare. This is a unique opportunity to be working on Machine Learning models deployed on live Electronic Health Record data which enable and support various hospital functions and clinical workflows impacting thousands of patients each day.

Responsibilities

Understanding clinical requirements and translating them into technical problem statements.
Communicating results and observations to technical audiences as well as clinicians in the form of visualizations, presentations and reports.
Designing and implementing machine learning models to solve real world problems.
Working with live data streams as well as as large data repositories to enable training and inference of machine learning models.
Developing software that interacts with various hospital IT systems.

Requirements

5+ years of experience in software design and development.
2+ years of hands-on experience using Python based machine learning libraries such as scikit-learn, Tensorflow, Pytorch.
Experience working in a Linux environment and being comfortable with UNIX command line tools.
Familiarity with productivity tools like Git, Docker.
Familiarity with SQL, REST, Web programming.
In-depth conceptual understanding as well as hands-on experience with several supervised and unsupervised machine learning algorithms, such as Random Forests, Logistic Regression, Gradient Boosting, Neural Networks, PCA, K-means, etc.

Strongly preferred:

Prior experience with production deployment of software systems and/or machine learning systems.
Educational background involving quantitative techniques (CS, EE, Math, Statistics, etc.)

Position 2: Software engineer for public release of code

Description

Our team has built a state-of-the-art EHR representation learning technique named CLMBR. We are looking to recruit a research assistant (RA) to assist in developing publicly releasable code for broad use of CLMBR.

Deploying risk-stratification models in the clinic requires addressing questions about the robustness of large, pre-trained models, such as characterizing their reliance on memorization and spurious correlations, as well as addressing issues of fairness and biases in training data. Pre-training via self-supervised representation learning (such as in BERT and GPT-3) have led to exciting advances in training models with limited labeled data. However many questions remain on how to evaluate representation learning methods (such as CLMBR) when used to learn patient representation from electronic health record (EHR) data that are used for a broad set of risk-stratification models.

The successful candidate for this RA position would be supervised by research scientists who are experts in representation learning, transfer learning and weak-supervision across multiple modalities of data. The RA will be responsible for implementing code for an open source API to enable rapid prototyping and evaluation of risk-stratification models built using CLMBR from Stanford's standardized EHR data repository STARR. This position provides a unique opportunity to explore machine learning in healthcare while working closely with both computational and clinical experts to develop tools for quickly building and evaluating clinical machine learning models.

Research Focus Areas

Developing robustness evaluations of EHR-based representation models
Contrastive learning with multi-modal EHR data (text + tabular data)

Required Skills

5+ years of experience in software design and development.
2+ years of hands-on experience using Python based machine learning libraries such as scikit-learn, Tensorflow, Pytorch.
Strong communication skills and prior research experience required
Experience working in a Linux environment and being comfortable with UNIX command line tools.
Familiarity with productivity tools like Git, Docker.

Preferred Skills

Familiarity with Google Cloud Platform (GCP) services such as BigQuery
Prior experiment working with Stanford's STARR OMOP data

Relevant Papers

CLMBR - Language models are an effective representation learning technique for electronic health record data https://arxiv.org/abs/2001.05295

Shah Lab

Table of Contents

Open RA positions.

Position 1: Machine Learning Engineer for point of care model deployments

Position 2: Software engineer for public release of code

Shah Lab

User Tools

Site Tools

Table of Contents

Open RA positions.

Position 1: Machine Learning Engineer for point of care model deployments

Position 2: Software engineer for public release of code

Page Tools