User Tools

Site Tools


We combine machine learning, text-mining, and prior knowledge in medical ontologies to discover hidden trends, build risk models, and drive comparative effectiveness studies to enable the learning health system. Our research group is part of the Center for Biomedical Informatics Research at Stanford and the National Center for Biomedical Ontology. Press coverage of our work can be found in Forbes, GigaOM, Science News, EHR Intelligence and the Stanford Medicine magazine.


We have shown that using unstructured data, it is possible to monitor for adverse drug events, learn drug-drug interactions, identify off-label drug usage, generate practice-based evidence for difficult-to-test clinical hypotheses, identify new medical insights, and generate phenotypic fingerprints as well as build predictive models. Our efforts in drug safety surveillance were recently the focus of a commentary titled Advancing the Science of Pharmacovigilance.

Learning Health System examples:

Data mining for drug safety:

  • Pharmacovigilance using clinical notes: Uses textual clinical notes for detecting single drug–adverse event associations (AUC of 80.4%) and for detecting drug–drug interactions (AUC of 81.5%).
  • Finding drug-drug interactions: We show that it is feasible to identify and estimate the rate of adverse events among patients on drug combinations from clinical text; and to find potentially better combinations.
  • Profiling the performance of FAERS: We find that not all events are equally detectable in AERS and specific events might be monitored more effectively using other data sources.
  • Pharmacovigilance Using Patient-Generated Data on the Internet: We show that the performance of ADR detection via search logs is comparable and complementary to detection based on the FDA’s adverse event reporting system (AERS). AUC of 0.82 from search logs, vs. 0.81 from AERS; improved by 19% on combining both sources.
  • Web scale pharmacovigilance: We find that anonymized signals on drug interactions can be mined from search logs.

Phenotypic profiling:

Effectiveness of large datasets and simple methods:

Group information


BIOMEDIN 215 Data Driven Medicine Autumn quarter of each year



start.txt · Last modified: 2015/05/19 17:23 by nigam