User Tools

Site Tools



With the spread of electronic health records, increasingly large data repositories of clinical and other patient derived data are being built. These databases are large and difficult for any one specialist to analyze. To find the hidden associations within such data, we review methods for large-scale data-mining on electronic medical records, methods in natural language processing and text-mining of medical records, methods for using ontologies for tagging of unstructured clinical notes.

SCHEDULE: TUE, THU 2:15 PM - 3:30 PM
LOCATION: TBD for Fall 2012
Discussion Section: Medical School Office Building, X-228, FRI, 2:15 - 3:05

The course has four modules. The first module will review the medical data miner’s tool-kit—including the use of ontologies for data-mining. The remaining modules will review three problem areas and computational methods used in that problem area via a set of 5 lectures ending in a “mini project” as home work. Each module will cover a new application area (e.g., predicting readmission, drug safety surveillance, clinical text mining) and a new method (e.g. association rules, logistic regression). The course will use real, de-identified, large size patient datasets (millions of patients range) that are made available for a final research project associated with the course. This course is also offered in a 1 credit version (BIOMEDIN 225) which meets at the same time but does not require a final project.

Course Flyer

Course Materials

Schedule and Syllabus

(Subject to change)

The Petabyte Age Because More Isn't Just More — More Is Different
The Unreasonable Effectiveness of Data

Lecture Learning Objective Module Homework and Project Date HW Dataset
1 Introduction, Overview and Relevance Medical Dataminer's toolkit FS, HC 9/27/2011
2 Data mining in medicine -I Medical Dataminer's toolkit 9/29/2011
3 Data mining in medicine - II Medical Dataminer's toolkit 10/4/2011
4 Health Care Utilization Databases + Review of relevant Ontologies Medical Dataminer's toolkit 10/6/2011 AERS data
5 Introduction to Drug Safety Surveillance Drug Safety Surveillance HW-1 out10/11/2011
6 State of the art and Exemplar paper Drug Safety Surveillance 10/13/2011
7 Other methods Drug Safety Surveillance 10/18/2011
8 Other possible methods Drug Safety Surveillance 10/20/2011
9 Project propsals Drug Safety Surveillance HW-1 due10/25/2011
10 Introduction to Predicting Readmissions Predictive data-mining 10/27/2011
11 State of the art (Readmissions) Predictive data-miningHW-2 out11/1/2011 MIMIC II
12 Intro. to Co-morbidities and Exemplar paper (Discharge decision / Survival) Predictive data-mining 11/3/2011
13 State of the art (Discharge decision / Survival) Predictive data-mining 11/8/2011
14 Clinical Text Mining: Goals and Key Problems Clinical Text Mining 11/10/2011
15 History and Review of the state of the art Clinical Text Mining HW-2 due, HW-3 out11/15/2011 I2B2 data
16 i2b2 NLP challenges Clinical Text Mining 11/17/2011
17 Non traditional approaches; How to improve on existing methods Clinical Text Mining HW-3 due 11/29/2011
18 Wrap-up: What did we learn? What questions remain? - 12/1/2011
19 PROJECT PRESENTATIONS Final Project12/6/2011
20 PROJECT PRESENTATIONS Final Project12/8/2011

Useful References

Machine learning: an algorithmic perspective, Stephen Marsland

Introduction to the practice of statistics, David S. Moore, George P. McCabe

The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Trevor Hastie, Robert Tibshirani and Jerome Friedman

Mining of Massive Datasets, Anand Rajaraman and Jeff Ullman

biomedin215-2011.txt · Last modified: 2012/08/10 10:57 by nigam