CAREER: Neural Networks in the Practical Regime
University Of California-Los Angeles, Los Angeles CA
Investigators
Abstract
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2). Deep learning is the predominant approach in modern artificial intelligence that uses multi-layer artificial neural networks to infer complex relations from data. Despite the rapid adoption of deep learning in scientific and industrial applications, a theoretical basis to explain its success and methods to guard against its limits has yet to be established. This project addresses two critical gaps in the current theory of deep learning, namely the effects of working with specific types of data and the behavior of artificial neural networks in practical settings. This research is crucial to developing deep learning methods that are more efficient, reliable, safe, and broadly applicable. Some areas that would benefit directly are accelerating production pipelines, improving energy efficiency, and improving data handling in the increasing number of critical technologies that rely on deep learning. This project involves significant educational, community building, and outreach activities. The research will be directly integrated into interdisciplinary curricula and will generate research project topics for graduate students and capstone projects for undergraduate students. The project aims to prepare a diverse STEM workforce through long-term research and career mentoring for undergraduate and graduate students and postdocs, open seminars and discussion sessions with experts and enthusiasts at different career stages, internship opportunities, and dedicated training sessions. Artificial neural networks provide specific parametrizations to specific sets of candidate solutions to learning tasks, and parameter optimization procedures introduce specific preferences in the space of possible solutions. This project seeks to illuminate these complex relations in cases of practical interest that are not sufficiently well covered by existing mathematical theory, namely, where the networks have a moderate level of overparametrization in relation to the amount of training data. Importantly, it develops theories and methods that integrate and exploit the properties of the training data and parameter initialization. The research concerns the following three aims: (1) the function space description of moderately overparametrized networks, (2) the data-dependent description of the objective function and optimization, and (3) the explicit form of the bias of gradient descent in function space. The research program advances the state of the art by integrating the properties of the training data and addressing the optimization bias and model bias in interplay, which are challenges beyond the scope of existing methods. The project builds on preliminary work that blends applied mathematics and deep learning, in particular techniques connecting the geometry of parameter space, function space, and data space, as well as techniques based on information geometry, optimal transport, and algebraic statistics. The project will further develop important connections between geometry, probability, statistics, and machine learning and will offer unique opportunities for interdisciplinary research and education. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →