EAGER: Towards robust, interpretable deep learning via communication theory and neuro-inspiration
University Of California-Santa Barbara, Santa Barbara CA
Investigators
Abstract
Deep neural networks (DNNs) are attaining great success in an increasing array of applications, yet there remain persistent concerns regarding their lack of interpretability and robustness. The standard approach to training DNNs is to optimize an end-to-end cost function based on variants of gradient descent. This simple approach is flexible, allowing innovation in architectures and applications, and scaling to neural networks with a large number of parameters, given enough data and computational power. Such end-to-end, or top-down training, however, does not provide control over, or understanding of, the features being extracted by the layers of the neural networks. The vulnerability of DNNs to adversarial attacks, for example, is a symptom of this phenomenon. The proposed research seeks to address these drawbacks using ideas from communication theory and neuroscience: the goal is to actively shape the features being extracted by individual layers of the neural network, in addition to training the overall network to attain an end-to-end goal. This research will contribute to curricular enhancements in signal processing and machine learning explored via courses, REU projects and senior capstone projects. The proposed technical approach leverages the existing computational infrastructure for training, while imposing layer-by-layer constraints aimed at producing sparse, strong activations. Drawing on ideas from communication theory, the goal is to learn “matched filters” which enhance the “signal-to-noise ratio (SNR)” at neuron outputs at each layer. One may show that this approach is consistent with Hebbian and anti-Hebbian (HAH) learning as posited in neuroscience, in which neurons that are strongly activated for an input are promoted, with less active neurons being demoted. This work posits enhanced robustness via such an SNR-maximizing strategy, together with additional nonlinear transformations such as divisive normalization borrowed from neuroscience. While preliminary visualizations indicate more interpretable neurons, there is reason to expect sparse, strong activations to be more amenable to quantitative interpretation via statistical and information-theoretic analysis. The goal of the proposed research is two-fold: to gain theoretical insight into HAH-based learning via toy models, and to demonstrate practical gains in robustness and interpretability relative to state of the art DNNs. Experimental evaluations will initially be conducted on image datasets which provide standard performance benchmarks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →