Partitioning-Based Learning Methods for Treatment Effect Estimation and Inference

$453,195FY2023SBENSF

Princeton University, Princeton NJ

Investigators

Abstract

As large data-driven technologies continue to be used in high-stakes decision-making, the need for fast, easily interpretable algorithms for large data analyses has become more important. A common approach in large data analyses is to use what is commonly called adaptive partitioning in which the large data set is partitioned and recursively used in the analyses. However, the rationale behind these partitioning methods is not well understood as the broad adoption of machine learning methods in applications has not always been supported by development of theoretical and methodological tools to understand their properties. This research project will study the rationale behind algorithms used in machine learning and other methods for large data analyses. The research will then develop new and improved methods for machine learning and other large data analyses and apply these methods to several economic problems. The results of this research will not only significantly improve machine learning and other large data analyses, but it will also improve decision-making generally, increase economic growth, and help establish the US as a global leader in large data analyses and machine learning. A technical challenge in formally studying adaptive partitioning and other flexible learning methods is that the randomness introduced by the often-recursive partition scheme is difficult to account for. This research will provide an array of theoretical and methodological results for adaptive partition-based and other flexible learning methods, providing both positive and negative results. The research will develop new treatment effect estimation and inference methods, and guide practice in program evaluation and causal inference. One of the main results shows that many popular recursive partitioning methods for heterogeneous treatment effect estimation can be pointwise (uniformly) inconsistent over the support of the conditioning variables. Other results show that adaptive oblique decision trees can have accuracy on par with neural networks. The research also studies non-linear partitioning-based methods with applications to quantile regression and treatment effects. Spin-off projects on causal inference and program evaluation will also be undertaken as part of this research. The results of this research will improve the analyses of large scale data, such as those used to make decisions and thus improve program evaluation. Besides improving economic decision making and economic growth, the results will also establish the US as a global leader in program evaluation. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →