Events and Seminars : 2013 Seminars

A NEW SCALABLE BAYESIAN STATISTICAL METHODS FOR GENE PATHWAYS DISCOVERY

CHANGWON YOO, S.M., PH.D.
Department of Biostatistics
Florida International University

TUESDAY, NOVEMBER 19, 2013
2:00 p.m.– 3:00 p.m, CRB 692

To understand the physiology of genes from cells involved in a complex disease, it is necessary to learn the causal relationships between those genes. To this end, it is ideal to compare genetic experiments with complete interventions, e.g., gene knockouts, to those with no interventions. While conducting genetic experiments with complete interventions on animal cells, e.g., mouse cells, is currently infeasible, when and if the technology becomes available, scientists will need established statistical methods to detect causal relationships in these cases. The results can then be verified in wetlab experiments.
In order to additionally examine other promising causal relationships that many current causal discovery algorithms are not guaranteed to visit, in this article we introduce a novel extension — Equivalence checking Local Implicit latent variable scoring method with mixture of observational and intervention data (EquLIMmix) — to an existing causal Bayesian network discovery algorithm, the Local Implicit latent variable scoring Method (LIM). To avoid the possible problem of other algorithms either not detecting or incorrectly predicting causal relationships, for every structure visited during LIM’s structure search, EquLIMmix also visits and scores the same structure with all directed arcs reversed. We hypothesize that the new algorithm (EquLIMmix) will improve over LIM’s ability to detect causal relationships both from datasets mixing complete interventions with observational data.
We use LIM and EquLIMmix to analyze simulated datasets mixing a small number of complete interventions per gene with observational data. To test both algorithms’ abilities to detect causal relationships from realistic data, we generate the datasets from a gene regulation pathway model of malignant mesothelioma formation proposed by an expert. Using the metrics of Area Under Receiver Operating Characteristic (AUROC) curve, Positive Predictive Value (PPV), Negative Predictive Value (NPV), Accuracy, and Shannon Entropy, we show that EquLIMmix exhibits clear advantages over LIM with smaller datasets (with generally better performances for larger datasets). EquLIMmix therefore improves over LIM’s ability detect causal relationships in gene networks both from small (< 50) mixture of observational and intervention data