With the spread of electronic health records, increasingly large data repositories of clinical and other patient derived data are being built. These databases are large and difficult for any one specialist to analyze. To find the hidden associations within such data, we review methods for large-scale data-mining on electronic medical records, methods in natural language processing and text-mining of medical records, methods for using ontologies for tagging of unstructured clinical notes.
SCHEDULE: TUE, THU 2:15 PM - 3:30 PM
LOCATION: TBD for Fall 2012
CREDITS: 3
Discussion Section: Medical School Office Building, X-228, FRI, 2:15 - 3:05
The course has four modules. The first module will review the medical data miner’s tool-kit—including the use of ontologies for data-mining. The remaining modules will review three problem areas and computational methods used in that problem area via a set of 5 lectures ending in a “mini project” as home work. Each module will cover a new application area (e.g., predicting readmission, drug safety surveillance, clinical text mining) and a new method (e.g. association rules, logistic regression). The course will use real, de-identified, large size patient datasets (millions of patients range) that are made available for a final research project associated with the course. This course is also offered in a 1 credit version (BIOMEDIN 225) which meets at the same time but does not require a final project.
(Subject to change)
The Petabyte Age Because More Isn't Just More — More Is Different
The Unreasonable Effectiveness of Data
Lecture | Learning Objective | Module | Homework and Project | Date | HW Dataset |
---|---|---|---|---|---|
1 | Introduction, Overview and Relevance | Medical Dataminer's toolkit | FS, HC | 9/27/2011 | |
2 | Data mining in medicine -I | Medical Dataminer's toolkit | 9/29/2011 | ||
3 | Data mining in medicine - II | Medical Dataminer's toolkit | 10/4/2011 | ||
4 | Health Care Utilization Databases + Review of relevant Ontologies | Medical Dataminer's toolkit | 10/6/2011 | AERS data | |
- | |||||
5 | Introduction to Drug Safety Surveillance | Drug Safety Surveillance | HW-1 out | 10/11/2011 | |
6 | State of the art and Exemplar paper | Drug Safety Surveillance | 10/13/2011 | ||
7 | Other methods | Drug Safety Surveillance | 10/18/2011 | ||
8 | Other possible methods | Drug Safety Surveillance | 10/20/2011 | ||
9 | Project propsals | Drug Safety Surveillance | HW-1 due | 10/25/2011 | |
- | |||||
10 | Introduction to Predicting Readmissions | Predictive data-mining | 10/27/2011 | ||
11 | State of the art (Readmissions) | Predictive data-mining | HW-2 out | 11/1/2011 | MIMIC II |
12 | Intro. to Co-morbidities and Exemplar paper (Discharge decision / Survival) | Predictive data-mining | 11/3/2011 | ||
13 | State of the art (Discharge decision / Survival) | Predictive data-mining | 11/8/2011 | ||
- | |||||
14 | Clinical Text Mining: Goals and Key Problems | Clinical Text Mining | 11/10/2011 | ||
15 | History and Review of the state of the art | Clinical Text Mining | HW-2 due, HW-3 out | 11/15/2011 | I2B2 data |
16 | i2b2 NLP challenges | Clinical Text Mining | 11/17/2011 | ||
17 | Non traditional approaches; How to improve on existing methods | Clinical Text Mining | HW-3 due | 11/29/2011 | |
18 | Wrap-up: What did we learn? What questions remain? | - | 12/1/2011 | ||
- | |||||
19 | PROJECT PRESENTATIONS | Final Project | 12/6/2011 | ||
20 | PROJECT PRESENTATIONS | Final Project | 12/8/2011 |
Machine learning: an algorithmic perspective, Stephen Marsland
http://www.amazon.com/Machine-Learning-Algorithmic-Perspective-Recognition/dp/1420067184
Introduction to the practice of statistics, David S. Moore, George P. McCabe
http://searchworks.stanford.edu/view/5470778
http://www.amazon.com/Introduction-Practice-Statistics-George-McCabe/dp/071676282X
The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Trevor Hastie, Robert Tibshirani and Jerome Friedman
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Mining of Massive Datasets, Anand Rajaraman and Jeff Ullman
http://infolab.stanford.edu/~ullman/mmds.html