Collaborative Research: AF: Small: Adaptive Optimization of Stochastic and Noisy Function

$101,000FY2020CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

The science of artificial intelligence, and the technology of machine learning (ML) in particular, has had a huge impact on modern society. This impact is only expected to grow in the future. At the heart of ML is the process of training the parameters of an intelligent (computer) system, which requires applied-mathematics techniques in the area known as mathematical optimization. The many recent successes of ML, such as in computer vision and natural-language processing, have been made possible with the use of a certain mathematical-optimization algorithm. This algorithm allows the intelligent system to learn through the iterative random selection of data points from within a large-scale dataset. This random sampling is absolutely essential, since otherwise the learning process of any intelligent system would be slowed as the amount of available data increases. However, despite these recent successes, optimization techniques such as this one have fundamental shortcomings that impede them from being effective for next-generation ML tasks. For example, each application of the algorithm requires a careful data-dependent tuning process, which may cause the training of an intelligent system for a single task to require weeks or months of computation on a supercomputer. One avenue for avoiding such computational expense is through the design of optimization techniques that "adaptively" tune themselves. The goals of this project are to design and provide theoretical guarantees for such adaptive algorithms. There have been various previously proposed enhancements and extensions to the aforementioned algorithm, known as the stochastic gradient (SG) algorithm. However, many of these algorithms also possess the shortcoming of being nonadaptive, meaning that their successful application in practice requires expensive "hyperparameter" tuning efforts. The adaptive algorithms considered in this project for the "stochastic optimization" setting of ML are based on the various successful methodologies in the "deterministic optimization" literature. These include so-called "line search" and "trust region" methodologies. However, since neither of these methodologies result in optimal worst-case complexity guarantees, the focus of the project is on the design of adaptive optimal-complexity methods, such as so-called "cubic-regularization" algorithms. The design of adaptive cubic-regularization algorithms for the stochastic setting will be achieved by building on a theoretical framework that views adaptive minimization as a "renewal-reward" stochastic process. This work will combine analytical techniques from the mathematical-optimization and stochastic-process literatures, and will provide a solid theoretical and practical foundation for researchers working in applied mathematics, computer science, statistics, and various engineering fields. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →