Efficient Estimation of Treatment Effects via Nonparametric Machine Learning

$154,390FY2020MPSNSF

University Of California-Riverside, Riverside CA

Investigators

Abstract

Advances in technology have created numerous large-scale datasets in observational studies, which bring unprecedented opportunities for evaluating the effectiveness of various treatments. The complex nature of large-scale observational data, such as its massive volume and high dimensions in confounders, pose great challenges to the existing conventional methods for causal analysis. The corresponding statistical implication is that even a small amount of bias can easily lead to erroneous conclusions. Thanks to the rapid development of scalable computing techniques, nonparametric machine learning methods have a strong potential for bias reduction via employing data-driven strategies for dimensionality reduction. However, careful consideration must be given in order to realize the potential for studying the underlying causal mechanisms. This project aims to develop cutting-edge statistical methods with theoretical insights for causality analysis using deep neural networks. The new statistical tools meet the immediate needs from various scientific areas for exploring causal relationships from large-scale observational data. The research will facilitate the causality analysis of modern complex data with important applications. This project will integrate research and education through course development, open-source software development, and undergraduate and graduate student training. The PI will develop a new unified approach with thorough theoretical justifications for efficient estimation of causal effects using deep neural networks. The method will then be applied to large-scale datasets with binary, multi-valued, or continuous-valued treatment variables. Three interconnected topics will be pursued. Specifically, the PI will offer a new perspective on learning treatment effects through a generalized optimization estimation. As a result, two convenient and efficient estimators of treatment effects will be developed, respectively, in the second and third topics. The estimators involve one nuisance model that will be approximated by deep neural networks, which will be investigated in the first topic. The general optimization framework includes the average, quantile and asymmetric least squares treatment effects as special cases. The methods take full advantage of the large sample size of large-scale data and provide effective protection against model mis-specification bias. The project involves devising new machine learning methods and algorithms for causal studies, establishing the theoretical validity, and developing valid inference procedures. It will promote machine learning methods for causality analysis, and will break new ground in drawing causal inference from large-scale observational datasets. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →