III: Medium: Causal inference in biobanks: Leveraging genetics to infer causal relationships using electronic health records
University Of California-Los Angeles, Los Angeles CA
Investigators
Abstract
The past several years have witnessed major efforts to collect genetic data from patients in large health systems to enable research aimed at improving patient health. This data can potentially identify risk factors that cause disease and improve treatment. However, the observational nature of these datasets makes such inferences challenging due, in part, to the difficulty of differentiating between correlation and causation which can obscure true relationships. This project will utilize and extend recently developed techniques in causal inference to allow for the identification of causal relationships within the medical data and overcome this difficulty. Advancing this research is critical for improving the outlook for individuals who suffer from today’s most prevalent common, complex disorders and will also provide general insights into the analysis of observational data. The project leverages efforts at UCLA to broaden participation in computing and will incorporate graduate and undergraduate students from diverse backgrounds. We propose to leverage modern techniques for causal inference coupled with the unique characteristics of genetic data collected in Biobanks to solve three key problems in biomedicine and epidemiology: the identification of risk factors for disease, predicting likely responders to a potential treatment, and identifying latent disease subtypes. The advance in causal inference that is directly relevant to our problem is the development in theory on causal graphs as a unifying framework to represent and reason about causal effects. We will use these graphs to test and estimate causal relationships between relevant exposures measured in the biobank and diseases (for example, LDL cholesterol and heart attack). Crucially, we will leverage the availability of genetic data to serve as causal anchors (or instrumental variables) that can enable the estimation of causal effects even in the presence of confounders expanding the technique of mendelian randomization that is widely used in epidemiology. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →