OAC Core: Small: Higher Order Solvers for Training Machine Learning Models
Purdue University, West Lafayette IN
Investigators
Abstract
Machine learning (ML) techniques have emerged as a key enabling technology for a broad class of applications, from business enterprises to engineering design. These techniques rely on complex models that must be suitably trained on large amounts of data. This training process takes the form of mathematical optimization, which minimizes the error between model output and known output, for training data. Owing to the large number of degrees of freedom in the model, the complexity of the objective function being minimized, and the volume of training data, the process of effectively and efficiently training ML models is a critical step in machine learning. The goal of this project is to develop novel optimization techniques, their implementations on large scale parallel platforms with GPU accelerators, validation in the context of diverse ML applications, and development of highly optimized, robust, and usable software tools and libraries. These software tools will be specialized to various ML models, and incorporated into commonly used software frameworks such as TensorFlow -- thus making them seamlessly accessible to a very large and diverse user community. The robustness, performance, and scalability of the software provide unique capabilities, with the potential to redefine the state of the art in ML applications, in terms of supporting significantly more complex ML models, enhancing generalizability from training to test data, and significantly reducing training time. Building on these intellectual and broader impact goals, the project integrates a number of activities aimed at broadening participation and creating educational opportunities and content. These include summer schools for undergraduate students to channel them into research careers, providing research opportunities for undergraduates through the school year, development of new educational material that integrates learning with hands-on use of software, and motivating novel formulations and methods in machine learning. The technical goals of the project are accomplished through a combination of novel numerical methods, statistical sampling techniques, highly scalable parallel implementations, and efficient use of GPUs. The project has the following specific aims: (i) development of second order Newton-type methods for non-convex problems. Specifically, the project focuses on Trust Region (TR) and Cubic Regularization (CR) based methods that rely on approximations to the Hessian and Fisher information matrices to deliver highly efficient solvers; (ii) development of a complete Higher Order Optimization Procedures (HOOP) toolkit, including unbiased and biased sampled Hessians, block diagonal approximations of the Fisher matrix, efficient and effective preconditioners for the Conjugate Gradient (CG) and CG-Steihaug solvers, and problem-specific optimizations; (iii) development of efficient parallel methods based on a combination of Alternating Direction Method of Multipliers (ADMM) and parallel matrix solvers, for scalable hardware platforms with GPU accelerators, as well as an integration of the software into TensorFlow. The software will also be made available as containerized executables that can be instantiated at clients with minimal effort, as libraries that can be used to build new ML applications, and as web accessible services for education and training; and (iv) demonstration of the effectiveness of the new methods on important application classes, including solution of large-scale semi-definite programs (SDP), problems in matrix factorization and distance metric learning, and training of deep neural networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →