My research group develops methods to analyze large unstructured data sets for data-driven medicine. We use ontology based approaches to annotate, index and analyze Big Data in biomedicine for enabling data-driven decision making in medicine and health care. Our research group is part of the Center for Biomedical Informatics Research at Stanford and the National Center for Biomedical Ontology.
Data driven medicine: The goal of this research is to combine machine learning and text-mining with prior knowledge encoded in medical ontologies to discover hidden trends and build risk models as well as drive data driven decision making and comparative effectiveness studies. We now have developed methods that transform unstructured patient notes into a de-identified, temporally ordered, patient-feature matrix (Imagine it as row = patient, column = medical concept, 1 = present, 0 = absent). With the resulting high-throughput data, we can monitor for adverse drug events, identify off-label drug usage, uncover ‘natural experiments’, and generate practice-based evidence for difficult-to-test clinical hypotheses as well as build predictive models.
Annotation Analytics: In order to understand the “gene lists” from analysis of high-throughput data, researchers routinely use Gene Ontology based analyses. With available methods for automated annotation and the existence of over 200 biomedical ontologies, it’s time for “big data” mining in annotation analysis. For example, by annotating known protein mutations with disease terms, we identified a class of diseases – blood coagulation disorders – that are associated with depletion in substitutions at O-linked glycosylation sites.
We are recruiting! Open Postdoctoral position