GGrantIndex
← Search

CAREER: Robust and Efficient Algorithms for Statistical Estimation and Inference

$400,000FY2021MPSNSF

University Of Southern California, Los Angeles CA

Investigators

Abstract

Statistical and machine learning methods are useful for making data-driven decisions. These methods are particularly advantageous in the presence of uncertainty originating from measurement errors or randomness inherent to the data collection process itself. The principal investigator (PI) will pursue a research program to develop statistical and machine learning methods possessing two important characteristics, robustness and efficiency. Robust algorithms are characterized by their ability to perform well even when some of the data are not accurate and clean but instead are "outliers," such as completely irrelevant or grossly corrupted measurements. Robust techniques can help to reduce the amount of resources spent on human-supervised data cleaning and preprocessing. On the other hand, methods that are efficient are able to extract most of the useful information contained in the data, therefore reducing the amount of uncertainty in the data-guided decision. This research program will be integrated with educational activities that, among other things, will expose undergraduate and graduate students to cutting edge approaches in statistics and machine learning, and give students an opportunity to serve as individual tutors and mentors at local K-12 schools. One part of the research program is devoted to investigation of the connections between self-normalized sums and robust statistical techniques. In particular, the PI will demonstrate that, unlike many existing approaches, algorithms based on the self-normalized sums often give rise to efficient methods, for instance in the context of univariate and multivariate mean estimation. Analysis of such algorithms is closely related to the theory self-normalized processes. Another part of the research focuses on the asymptotic properties of U-statistics of growing order, and the implications of these properties for robust and efficient empirical risk minimization (ERM), one of the key principles underlying modern mathematical statistics and machine learning algorithms. The PI will introduce a new approach to robust ERM and will relate questions about efficiency of resulting algorithms to purely mathematical questions in the theory of U-statistics. Finally, the research will address uncertainly quantification in robust statistics using Bayesian methods. Specifically, the PI aims to develop new robust analogues of the standard posterior distribution based on U-statistics of growing order, and will investigate the asymptotic behavior of these robust posteriors as well as the asymptotic frequentist properties of the corresponding credible sets. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →