Statistical Inference for Optimal Transport
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
This research project concerns optimal transport, which is a mathematical method for transforming one probability distribution into another probability distribution. Optimal transport has been used to transfer data from one scientific domain to another, thus enabling scientists to combine data from different sources. It has also been used to ensure that algorithms do not create unintended biases against demographic groups. The focus of this project is to develop rigorous statistical methods for optimal transport that permit precise assessment of uncertainty due to the fact that we only have access to finite datasets. The methods will be used with collaborators in particle physics to address data analysis problems that arise in data obtained for particle accelerators. Graduate students will be trained by including them in the research. Division of Physics provides cofunding for this award. This project has three thrusts. The first is to develop statistical inference for transport maps. The aim is to prove central limit theorems for these estimated maps and then use these theorems to construct confidence intervals. The investigators will also extend inferential ideas to robust versions of transport and to the Gromov-Wasserstein distance, which extends the idea of transport to measures on different spaces. THey will then consider semiparametric theory (double robustness), higher-order inference, and optimal hypothesis testing. The second thrust is the development of new transport maps. By departing from the original definition, they can derive slightly less efficient maps that are easier to estimate and that still have good properties. The third thrust is to apply the methods to the physical sciences. This includes using optimal transport for estimating background distributions in particle physics, for simulator-based inference, to quantify the systematic uncertainty in particle unfolding, and to decorrelate signal classifiers from protected variables. They will also develop missing data transport for use when data from a signal region is not available. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →