GGrantIndex
← Search

EAPSI: Developing Statistical Methods for Removing Unwanted Variation with Negative Controls in Genetics and Causal Inference

$400FY2017O/DNSF

Hunter Kristen B, Cambridge MA

Investigators

Abstract

Unwanted variation is a common problem in statistical analysis that makes it difficult for researchers to distinguish between signal and noise. For example, in the field of genetics, a researcher might to try to detect the difference in gene expression between a cancer tumor and a nearby benign region. The difference between the two regions is the variation of interest--this is wanted variation. However, differences in genetic techniques between laboratories can create unwanted variation in the data, which can lead to the researcher reaching incorrect conclusions from the data. The unwanted variation can either induce bias, distorting the true relationship between the genes and diseases, or can decrease precision, masking the variation of interest. Ideally, the researcher would like to remove the unwanted variation. One way to achieve this goal is through a negative control, which is a variable that is associated with the unwanted variation, but is not associated with the variation of interest. The goal of this project is to develop statistical techniques for using negative controls to effectively remove unwanted variation while keeping the variation of interest. These methods will be useful in a wide variety of fields, but will focus on genetics and causal inference, including experimental design. This research will be conducted in collaboration with Professor Terence Speed, a leading expert on negative controls in this context, at the Walter and Eliza Hall Medical Institute in Melbourne, Australia. The fields of epidemiology, causal inference, and genetics all have methods related to removing unwanted variation. This project will synthesize approaches from these fields to build new statistical methods for negative controls. These methods include hypothesis testing, propensity score matching, and parametric models, and rely on different frameworks and assumptions. This project will bring together the flexibility of causal inference techniques, the statistical efficiency of genetics methods, and the rigorous hypotheses of epidemiology. This award, under the East Asia and Pacific Summer Institutes program, supports summer research by a U.S. graduate student and is jointly funded by NSF and the Australian Academy of Science.

View original record on NSF Award Search →