User Tools

Site Tools


Table of Contents


We develop methods to annotate, index and analyze large unstructured datasets for enabling use cases of the learning health system. Our research group is part of the Center for Biomedical Informatics Research at Stanford and the National Center for Biomedical Ontology. Press coverage of our work can be found in Forbes, GigaOM, Science News, EHR Intelligence and the Stanford Medicine magazine. We combine machine learning, text-mining, and prior knowledge in medical ontologies to discover hidden trends, build risk models, and drive comparative effectiveness studies to enable data-driven medicine.

We have shown that using unstructured data, it is possible to monitor for adverse drug events, learn drug-drug interactions, identify off-label drug usage, generate practice-based evidence for difficult-to-test clinical hypotheses, identify new medical insights, and generate phenotypic fingerprints as well as build predictive models. We have efforts around combining multiple information sources for drug safety surveillance, which were recently the focus of a commentary titled Advancing the Science of Pharmacovigilance. We have also shown that it is possible to use automated annotations and multiple biomedical ontologies to go beyond just Gene Ontology annotations for enrichment analysis using disease ontologies in order to understand the “gene lists” from analysis of high-throughput data.

Learning Health System examples:

Data mining for drug safety:

  • Pharmacovigilance using clinical notes: Uses textual clinical notes for detecting single drug–adverse event associations (AUC of 80.4%) and for detecting drug–drug interactions (AUC of 81.5%).
  • Finding drug-drug interactions: We show that it is feasible to identify and estimate the rate of adverse events among patients on drug combinations from clinical text; and to find potentially better combinations.
  • Profiling the performance of FAERS: We find that not all events are equally detectable in AERS and specific events might be monitored more effectively using other data sources.
  • Pharmacovigilance Using Patient-Generated Data on the Internet: We show that the performance of ADR detection via search logs is comparable and complementary to detection based on the FDA’s adverse event reporting system (AERS). AUC of 0.82 from search logs, vs. 0.81 from AERS; improved by 19% on combining both sources.
  • Web scale pharmacovigilance: We find that anonymized signals on drug interactions can be mined from search logs.

Phenotypic profiling:

Our Group: Lab members
Open Positions: Postdoc position | Data Science Fellow
Internal (log in required): Lab information, Projects, Rotations, Archived pages
On Boarding: New Lab members, For Collaborators


BIOMEDIN 215 Data Driven Medicine Autumn quarter of each year


start.txt · Last modified: 2014/07/11 15:08 by acallaha