CAREER: Overparameterization in modern machine learning: A panacea or a pitfall?

$461,305FY2023CSENSF

Georgia Tech Research Corporation, Atlanta GA

Investigators

Abstract

Deep neural networks overwhelmingly dominate the empirical machine learning landscape. Their state-of-the-art performance, however, remains poorly understood, brittle, and resource-intensive to obtain. In particular, their good generalization properties, or their ability to make accurate predictions on previously unseen data, are largely unexplained. Especially unusual is that, in contrast to classical machine learning models, state-of-the-art neural networks are frequently heavily overparameterized; that is, much “larger” than their training data set. Recent research has revealed a better understanding of the possible benefits of such overparameterization, but only in elementary model families. The ramifications of overparameterization in deep neural networks, which exhibit complex and distinct behaviors, present many unknowns. In the absence of a first-principles theory, outstanding failure modes in deep neural networks remain unmitigated or unnecessarily costly to solve, and architecture selection is conducted in a wasteful trial-and-error manner that involves repeated train-and-test cycles. This limits deep learning technology from reaching its full potential, particularly in high-stakes and resource-limited applications. This project will bridge the gap between the recent theory of overparameterized linear models and real-world neural networks through a diversity of mathematical techniques spanning signal processing, information theory, and online decision-making. In particular, the project will: 1) examine the implications of overparameterization on the test regression and classification performance of deep neural networks; 2) characterize the robustness of overparameterized models (both linear and nonlinear) to adversarial perturbations and significant shifts in the data distribution; and 3) design robust principles for data-driven model selection in modern machine learning. Ultimately, this project aims to establish foundational mathematical principles to explain not only the successful generalization of modern machine learning, but also its failure modes---in turn paving the way for developing efficient and principled solutions. This project will also create and disseminate educational resources at the high school and undergraduate levels on elementary signal processing, machine learning, and data science that underlie and complement the described research. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →