Conference: Drawing Causal Inference from Big Data

$100,000FY2015CSENSF

Indiana University, Bloomington IN

Investigators

Abstract

A conference titled "Drawing Causal Inference from Big Data" will be held March 26 and 27, 2015, in the National Academy of Sciences auditorium in Washington DC. The purpose of this conference is to present state-of-the-art approaches to the problem, and to bring together leading experts, both the featured speakers and other experts, who will generate progress through their interactions. In many respects the subject of this conference is in its infancy because the many methods that have been developed and used for causal inference in small data do not scale up, because Big Data is often collected in the field in uncontrolled fashion, and because of the sheer size of the data that, contrary to popular belief, make it more rather than less difficult to identify causal effects. The problems in dealing with Big Data are in good part rooted in the limitations of human cognition, so ongoing efforts are aimed at the development of computational algorithms. However it is likely that computational techniques are best viewed as augmenting rather than replacing human insight: Current algorithms can find complex patterns and associations but most are not aimed to discover causal explanations. The conference also addresses the appropriate way to define causality in large data collected from chaotic and noisy systems, and the way to find causes that lie outside the measured variables. For example a correlation observed in a health survey based on genetic mapping might be due to an unmeasured environmental factor such as poverty. The subject of the conference is of vital and current interest to every field of study, business, and government agencies. Our society has developed methods of collecting and storing enormous amounts of data, and is increasingly doing so. The data can arrive from controlled experiments, but most often comes from relatively uncontrolled field observations, such as those from social networks, human medical and genetic measurements, and patterns of purchases. The amount of data has far outstripped our ability to discern what important patterns are in the data, and most important, what explains those patterns. In a typical large database there are huge number of variables that can be measured, and virtually uncountable numbers of correlations between different subgroups of those variables. There are enormous potential benefits to science, business, government, and society if the critical patterns in Big Data can not only be ascertained but explained. Explanation is the goal of this conference, represented by the phrase, "drawing causal inference." The most pressing questions we face are causal in nature. In health we might observe that a particular treatment is associated with a decrease of cancer deaths, but need to know if the treatment is the cause of the decrease. In education we might observe that students held back in early grades tend to drop out of high school, but need to know if the treatment causes that result.

View original record on NSF Award Search →