Statistical Methods for Causal Inference in Geographic Regression Discontinuity Designs

$330,000FY2015SBENSF

Harvard University, Cambridge MA

Investigators

Abstract

This research project will develop methods to improve understanding of estimates from geographically referenced observational studies. These methods will enable researchers to accurately assess uncertainty for this class of problems, thereby reducing the chance of over-confident or potentially misleading results being presented to the public. When one is faced with geographically referenced data, such as those collected by population registries or satellite imagery, attempts to infer causal relationships between treatments and outcomes often are thwarted by the complex underlying spatial structure. For example, one might wish to estimate the influence of flooding on anxiety by comparing units in the flood zone to units outside the flood zone. However, underlying, unmeasured, and geographically varying characteristics, such as socio-economic status, may confound this relationship. Common approaches to these problems often substantially underestimate uncertainty, have serious issues of bias, or rely on very strong modeling assumptions, potentially leading to erroneous conclusions and findings. This project will address these issues by developing methods that adequately characterize and model spatial variation in the context of causal inference. The researchers also will develop and release software for analysis, and they will host a workshop focusing on this topic. This research project bridges the fields of causality and spatial statistics and offers a unified framework for inferring causal relationships in spatially referenced data. The researchers will extend the regression discontinuity design framework, where units just above and below some cut-point that determines treatment are compared to infer a causal relationship. For example, one might compare those on either side of a high-water mark in a flood, with the assumption that due to their geographic proximity, such units, other than having experienced flooding, are similar. However, unlike classic regression discontinuity, here the boundary is a line rather than a point, which substantially complicates analysis. The project will create and evaluate flexible tools to handle these complications and demonstrate how to use these tools in real-world contexts. The project's most significant theoretical contribution will be to fit the slope of the response surface with respect to the boundary rather than the response surface itself, which allows for appropriate extrapolation. This enhancement, coupled with the flexible nature of fitting surfaces using spatial tools, will allow for the preservation of effective random assignment of units across a treatment boundary. It also will allow for causal inference with relatively few and weak modeling assumptions, something that is critical in an observational data context.

View original record on NSF Award Search →