Events and Seminars : 2014 Seminars

SOFTWARE FOR DISTRIBUTED COMPUTATION ON MEDICAL DATABASES

BALASUBRAMANIAN NARASIMHAN, PhD
Senior Research Scientist
Department of Health Research Policy and Department of Statistics
Stanford University
TUESDAY, APRIL 21, 2015
11:00 a.m.–12:00 p.m., CRB 692

Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward data, and a reluctance to give up control over end use. Statisticians have long known that aggregation of data is not always necessary for model fitting. In models based on maximizing a likelihood, the computations can be distributed, with aggregation limited to the intermediate results of calculations on local data, rather than raw data. We describe a set of software tools that allow the rapid assembly of a collaborative computational project, based on the flexible and extensible R statistical software and other open source packages that can work across a heterogeneous collection of database environments, with full transparency to allow local officials concerned with privacy protections to validate the safety of the method. We describe the principles, architecture, and examples of such distributed computation.
This is joint work with Samuel Gross from Statistics, Daniel Rubin and Marina Bendersky from Radiology and Philip Lavori from Health Research and Policy at Stanford.