GGrantIndex
← Search

CAREER: Statistics through the Sum of Squares Lens

$650,000FY2023CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Abstract

Statistics with large data sets of high-dimensional observations (e.g., images, videos, genomes) forms the basis for modern machine learning and data science. Despite extraordinary successes in practice and the potential for tremendous further impact, theoretical understanding of the basic algorithmic building blocks for high-dimensional statistics remains limited. Unlike in traditional (low-dimensional) statistics, the key bottleneck is not how much data is available but how much computational power is available to process it. The central aim of this proposal is to advance a theory of algorithmic statistics which can address the core questions: which problems in high-dimensional statistics have computationally efficient algorithms? And, what are the best -- most accurate, most robust, fastest, etc. -- algorithms for such problems? Pursuing answers to these questions is likely to lead both to foundational algorithmic innovations as well as to a deeper understanding of the fundamental limitations of efficient computation in statistical settings, which is a prerequisite for understanding when a given algorithm is the best possible at the task it performs. Furthermore, the curriculum development and mentoring components of this project will disseminate state-of-the-art algorithm design techniques to the broader scientific community via courses, lectures, and videos and train the next generation of algorithm designers for science and industry. This project will develop the Sum of Squares method (SoS) as a powerful tool for algorithm design in statistical settings and as a lens on fundamental limitations. SoS is a problem-independent approach to designing algorithms with strong provable guarantees, generalizing tools such as linear programming, and eigenvalue/eigenvector methods. SoS has already broadly impacted algorithms in numerous areas: combinatorial optimization, quantum information, cryptography, control theory, high-dimensional statistics, and more. Even within high-dimensional statistics, SoS already has many applications in clustering, robust estimation, and tensor decomposition (the inverse problem underlying the method of moments), to name only a few. However, knowledge of SoS remains incomplete: there is tremendous potential to push the frontier of efficient algorithms by studying SoS, including in high-dimensional statistics. The core technical thrust of this project will be developing novel algorithms based on SoS as well as the mathematical tools to understand the guarantees and fundamental limitations of those algorithms. The project will focus on algorithm design in two specific domains, privacy-preserving data analysis and Bayesian inference, where the investigator believes that there are opportunities for new algorithms with strong provable guarantees to offer new insights on algorithm design broadly, as well as to have a near-term impact on practice. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →