User Tools

Site Tools



The widespread adoption of electronic health records (EHRs) has created a new source of “big data”—namely, the record of routine clinical practice—as a by-product of care. This data source offers tremendous opportunities to revolutionize healthcare in the clinic and at the bedside and to advance our understanding of medicine. This graduate class will teach you how to use EHR and other patient data for better healthcare.

The course has five modules, with four lectures in each module. The first module will review the medical data miner’s tool-kit—including the use of ontologies in data-mining and healthcare utilization databases. The remaining modules will review four problem areas and computational methods used in that problem area ending in a “mini project” as home work. Each module will cover a new application area (e.g. drug safety surveillance, predictive analytics) and new methods (e.g. association rules, logistic regression). In addition, there are 8 discussion sections that provide in depth explanation of the methods referred to in the lectures. For 2015, these discussions will be recorded and available to SCPD (and remote) students.

The course will use real, de-identified, large size patient datasets for home work projects associated with the course. This course is also offered in a 2 credit version (BIOMEDIN 225) which meets at the same time but requires only one home work, which uses a public dataset.

Prerequisites: CS 106A; familiarity with statistics and biology.
Highly recommended: STATS 216.
Recommended: one of CS 246, STATS 305, HRP 258 or CS 229.

Schedule and Syllabus

Schedule: TUE, THU 1:30 PM - 2:50 PM
Lectures and Discussion: Skilling Auditorium (Fall 2015). Lectures and discussions are recorded
Videos: (posted about two hours after the class ends)
Office hours:

  • Thursday 3:00 - 4:00 PM (in Medical School Office Building, Student Lounge)
  • Wednesday 2:30 - 3:30 PM (in Medical School Office Building, Student Lounge).
  • TAs: David Moskowitz (dmosk AT, Sarah Poole (spoole AT, Vibhu Agarwal (vibhua AT
If nothing shows up in the space below, reload the page

All homework assignments will be due before the start of lecture (2:15 pm) on the day the next homework is released.

Course Materials

Miscellaneous References


older version when we had year end projects

Machine learning: an algorithmic perspective, Stephen Marsland

Introduction to the practice of statistics, David S. Moore, George P. McCabe

The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Trevor Hastie, Robert Tibshirani and Jerome Friedman

Mining of Massive Datasets, Anand Rajaraman and Jeff Ullman

The Petabyte Age Because More Isn't Just More — More Is Different
The Unreasonable Effectiveness of Data
A few useful things to know about machine learning ← only works for on campus access. Same content in

biomedin215.txt · Last modified: 2015/09/25 19:26 by vibhua