RI: Small: An Optimization Framework for Understanding Deep Networks

$458,000FY2016CSENSF

Johns Hopkins University, Baltimore MD

Investigators

Abstract

The past few years have seen a dramatic increase in the performance of pattern recognition systems due to the introduction of deep neural networks. However, the mathematical reasons for this success remain elusive. A key challenge is that the problem of learning the parameters of a neural network is a non-convex optimization problem, which makes finding the globally optimal parameters extremely difficult. Another challenge is that there is currently very limited theory about how the network architecture should be constructed (i.e., number of layers, number of neurons per layer, connectivity patterns, etc.). The goal of this project is to develop an optimization framework that provides theoretical insights for the success of current network architectures and guides the design of novel architectures with guarantees of global optimality. This project will develop a mathematical framework for the analysis of a broad class of non-convex optimization problems, including matrix factorization, tensor factorization, and deep learning. In particular, this project will study the problem of minimizing the sum of a loss function and a regularization function, both of which can be non-convex, but should satisfy a certain "positive homogeneity" property. By properly designing positively homogeneous regularizers that constrain the "network size," this project aims to show that, under certain conditions, all local minima are globally optimal, and one can find a global minimum from any initialization using a local descent strategy. A deeper understanding of the mathematical properties of deep networks will impact not only machine learning and optimization, where our understanding of non-convex problems continues to be very limited, but also application areas such as computer vision, speech and natural language processing, where deep networks currently give state-of-the-art results.

View original record on NSF Award Search →