CRII: CIF: New Paradigms in Generalization and Information-Theoretic Analysis of Deep Neural Networks
Cornell University, Ithaca NY
Investigators
Abstract
Over the past decade, deep learning (DL) has become the method of choice for various machine learning tasks. The realm of DL applications constantly expands, now including autonomous vehicles, robotic-assisted surgery, medical imaging, and many others. A wide societal acceptance of such technologies relies on the ability of humans to understand and trust them. Unfortunately, the exceptional practical effectiveness of DL systems is not coupled with a comprehensive theory to explain how they operate and why they are so successful on real-world data. This state of affairs obstructs a wider deployment of AI for the applications described above. To alleviate this impasse, this project seeks to open the hood of Deep Neural Networks (DNNs) that enable DL and elucidate how information is processed in these systems. Doing so would make the decisions of AI mechanisms more transparent to end users and other stakeholders, thus contributing to their understanding. Via rigorous performance guarantees, this project also aims to characterize the circumstances under which deep learning system are warranted not to fail. These advances will set the stage for the integration of high-performance AI systems in our daily lives, unlocking their invaluable potential impact. The project tackles key challenges in DL theory via a novel information-theoretic approach. The main objective is to shed light on the process by which DNNs progressively build representations --- from crude and over-redundant representations in shallow layers, to highly-clustered and interpretable ones in deeper layers --- and to give the designer more control over that process. To that end, three synergistic thrusts are pursued. First is developing novel complexity measures of internal representations by quantifying the flow of information through the DNN. Crucially, these measures are designed for efficient computation over layer dimensionalities typical to state-of-the-art networks for computer vision, speech, and text processing. The second thrust focuses on relating the developed complexity measures to the generalization capability of the network via new instance-dependent generalization bounds. The goal here is to provide performance guarantees for a given DNN in terms of efficiently computable figures of merit. Lastly, the developed machinery is further leveraged to construct tools for pruning redundant neurons/layers, visualizing the DNN's operation, and progressing DNN interpretability. Altogether, this research strives to progress the current uncertain trial-and-error process of DNN design towards the domain of deterministic engineering practice. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →