CAREER: Theoretical foundations of neural networks - representation, optimization, and generalization

$511,520FY2018CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

Neural networks form the backbone of machine learning's recent advances and sudden ubiquity. Despite this extensive empirical progress, however, a satisfactory understanding of their behavior is still missing. As neural networks enter more and more into human-facing services (self-driving cars, medical diagnostics, etc.), this status quo and in particular its safety ramifications becomes worrisome. This project aims for a theoretical understanding of the foundations of neural networks, divided into three pieces: (a) the representation question regarding which phenomena can be succinctly approximated by neural networks; (b) the optimization question of how to efficiently fit neural networks to data; and (c) the generalization question on why neural networks can fit not only the data they have seen but also the data they have not seen. Developing this understanding will form the core of this project's three broader impacts: (1) the research component will aim to improve safety and reliability of user-facing deployments of neural networks; (2) as an educational component, the research will be simplified and incorporated into freely available course notes; (3) the award supports two outreach efforts co-founded by the PI: UIUC-ML, a university-wide ML seminar; and the midwest ML symposium, a yearly midwest ML gathering. In more detail, the technical focus of this project, divided into the three learning theoretic topics above, is as follows. The core representation question is: what makes neural network representation special? In more detail, the proposed representation questions are firstly to characterize the power gained by adding a single layer to a network, and secondly to characterize the representation properties of recurrent neural networks, namely neural networks which evolve their state along with a time series they consume. Next comes the topic of optimization, where the key mystery is how neural networks manage to perfectly fit their data with simple iterative descent schemes, despite the apparent nonconvexity of the problem. The plan here is to establish an even stronger property: these iterative schemes manage to output networks which not only fit their data, but do so confidently, in the classical sense of margin theory. Finally, the proposal closes with the topic of generalization. The first goal is to develop refined generalization bounds to the point that they can be algorithmically enforced via effective regularization schemes, and secondarily to apply these techniques to the fitting of neural networks to probability distributions, specifically the problem of training Generative Adversarial Networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →