GGrantIndex
← Search

Collaborative Research: Statistical Optimal Transport in High Dimensional Mixtures

$199,999FY2022MPSNSF

Cornell University, Ithaca NY

Investigators

Abstract

This project studies high-dimensional mixture models, a class of statistical models that can be used to analyze data arising in linguistics, computational biology, and particle physics. This research project aims to define a new measure of distance between distributions that measures their similarity with respect to a mixture model and offers a principled way to compare, transform, and analyze high-dimensional data sets. As part of this project, the investigators will develop fast algorithms for estimating this distance and theoretical guarantees allowing this distance to be used for statistical inference. Specifically, this project defines a sketched Wasserstein distance (SWD) and will develop its computational and statistical properties. The primary aims are to establish duality relations for this distance, develop computationally feasible estimators for SWD using both primal and dual formulations, and to study the rates of convergence of the new estimators. In addition, the research aims to develop lower bounds to establish the rate optimality of these estimators and establish distributional limits to allow for the construction of asymptotically valid confidence intervals. These tools will be applied to data in text analysis, systems biology, and high-energy physics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →
Collaborative Research: Statistical Optimal Transport in High Dimensional Mixtures · GrantIndex