New Challenges in Statistical Inference with Regularized Optimal Transport
Cornell University, Ithaca NY
Investigators
Abstract
Driven by the abundance of data and computational advances, the application domain of statistical inference is ever-growing. Human-facing technologies, such as autonomous vehicles or robotic-assisted surgery, demand principled inference methods subject to rigorous performance guarantees. As many inference tasks reduce to comparing probability distributions, optimal transport theory — which provides a powerful framework for doing so — has emerged as a tool of choice for designing and analyzing inference methods. However, statistical optimal transport is bottlenecked by the curse of dimensionality, whereby quantitative results either deteriorate exponentially with dimension (for example, estimation rates) or are largely unavailable (for example, limit distributions, resampling, and more). To overcome this impasse, this project will explore modern regularization techniques for optimal transport distances and develop a comprehensive statistical theory to facilitate principled inference in high dimensions. This innovation is expected to have a strong impact on the broad application domain of statistical inference in industry, commerce, science, and society, by promoting principled implementations at scale backed by theoretical assurances. In conjunction, the educational component will provide rigorous training and diverse recruitment opportunities for students, along with a deliberate plan to promote collaborations between statistics and engineering communities working on optimal transport and related fields. This project will explore three prominent optimal transport regularization methods: (1) smoothing via convolution with a chosen kernel; (2) slicing via lower-dimensional projections; and (3) convexification via an entropic penalty. These techniques preserve the virtuous structure of classic optimal transport but reduce its complexity, which opens the door to a scalable statistical theory. The research agenda will tackle key theoretical challenges concerning statistical inference with regularized optimal transport distances, encompassing empirical error rates, limit distributions, semiparametric efficiency, resampling methods, Berry-Esseen type bounds, and computational-statistical gaps. The developed theory will be leveraged to address various inference applications, including generative modeling, testing, vector quantile regression, and intrinsic dimension estimation. The project will result in the theoretical underpinnings of inference methods at scale based on optimal transport theory, providing guidance and insight for practical implementations. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →