GGrantIndex
← Search

The Heavy-Tailed Methods in Machine Learning

$163,533FY2022MPSNSF

Florida State University, Tallahassee FL

Investigators

Abstract

Stochastic gradient descent and its variants are core algorithms for solving machine learning problems and work remarkably well in practice. However, a general theory that explains their success is still lacking. One popular approach is to impose structure on the gradient noise, typically modeled by Gaussian or other light-tailed distributions. However, many empirical and some recent theoretical works challenge these assumptions, calling for an understanding of heavy-tailed distributions and resulting phenomena in machine learning. In this project, a theoretical framework will be built towards understanding and explaining why and how heavy tailed distributions arise in popular machine learning algorithms, and how heavy tails can better explain their success, bridging a gap between theory and practice. The results derived from this project are expected to impact the mathematics community as well as developers and practitioners in the data science and machine learning communities. In this project, theoretical convergence properties and performance guarantees will be obtained for heavy-tailed stochastic gradient descent, their accelerated momentum-based methods, and continuous-time approximations. Further theoretical properties such as metastability will be studied to gain a further understanding of these heavy-tailed methods. A novel heavy-tailed adaptive Langevin algorithm and its variants will be developed and the theoretical guarantees will be studied for both sampling and non-convex stochastic optimization. Such an objective requires combining a broad set of ideas and mathematical tools from applied probability, continuous optimization, statistics and numerical analysis. Based on such mathematical developments the project will develop and study heavy-tailed algorithms with theoretical guarantees that can solve large-scale machine learning problems, ultimately building up a mathematical theory to explain the cause and implications of heavy-tailed distributions and other important phenomena that arise in machine learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →