CAREER: Learning and property testing -- a complexity theoretic perspective

$420,983FY2021CSENSF

University Of Pennsylvania, Philadelphia PA

Investigators

Abstract

From genomics to meteorology, advanced technology has led to collection of data at an unprecedented scale. In turn, the future promise of science is crucially reliant on our ability to analyze such massive and complex datasets. To gain deeper insights here, modern data analysis relies on mathematical abstractions called models. In particular, a central desideratum of machine learning is to find simple models which fit a given dataset. Accordingly, theoretical computer science, machine learning and statistics have developed both mathematical and algorithmic frameworks in this pursuit. The goal of this project is to put this agenda under the lens of computational complexity theory. In particular, using tools and insights from complexity theory, the investigator will study (i) efficient learning algorithms for several high-dimensional models which appear in machine learning and statistics, and (ii) explore the tradeoffs between robustness, efficiency, data size and accuracy for these models. Besides the mathematical foundations of machine learning, the underlying questions also appear in a wide variety of applications ranging from evolutionary biology to signal processing. Through the introduction of new courses at the intersection of complexity theory, machine learning and statistics, as well as other activities such as workshops and seminars, the project will train the next generation of graduate students who will achieve fluency in all of these areas leading to a further cross-pollination of ideas between these disciplines. The project will also support activities aimed at attracting undergraduate students to research in theoretical computer science. The project has two principal thrusts: (1) Noise tolerant learning algorithms for high-dimensional models: The project will study both supervised and unsupervised models such as linear separators, Gaussian mixture models and trace reconstruction (which is an abstraction of the problem of ancestral DNA reconstruction from evolutionary biology). For these models, the project will explore the three-way tradeoff between sample efficiency, computational efficiency and noise tolerance. (2) Algorithms to test high dimensional models for sparse representations: Examples of such models include Boolean functions over the hypercube and high-dimensional matrices. The overarching goal here is to design algorithms whose complexity is independent of the ambient dimension, thus avoiding the proverbial curse of dimensionality. Underlying this broad agenda is the insight that Boolean function analysis -- a suite of tools developed in complexity theory which combines combinatorics, harmonic analysis and probability theory -- is an effective algorithmic tool for learning and testing high dimensional models in machine learning and statistics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →