GGrantIndex
← Search

BIGDATA: Collaborative Research: F: Nomadic Algorithms for Machine Learning in the Cloud

$596,326FY2016CSENSF

University Of California-Santa Cruz, Santa Cruz CA

Investigators

Abstract

With an ever increasing ability to collect and archive data, massive data sets are becoming increasingly common. These data sets are often too big to fit into the main memory of a single computer, and so there is a great need for developing scalable and sophisticated machine learning methods for their analysis. In particular, one has to devise strategies to distribute the computation across multiple machines. However, stochastic optimization and inference algorithms that are so effective for large-scale machine learning appear to be inherently sequential. The main research goal of this project is to develop a novel "nomadic" framework that overcomes this barrier. This will be done by showing that many modern machine learning problems have a certain "double separability" property. The aim is to exploit this property to develop convergent, asynchronous, distributed, and fault tolerant algorithms that are well-suited for achieving high performance on commodity hardware that is prevalent on today's cloud computing platforms. In particular, over a four year period, the following will be developed: (i) parallel stochastic optimization algorithms for the multi-machine cloud computing setting, (ii) theoretical guarantees of convergence, (iii) open source code under a permissive license, (iv) application of these techniques to a variety of problem domains such as topic models and mixture models. In addition, a cohort of students who can transfer their skills to both industry and academia will be trained, and a graduate level course on scalable machine learning will be developed. The proposed research will enable practitioners in different application areas to quickly solve their big data problems. The results of the project will be disseminated widely through papers and open source software. Course material will be developed for the education of students in the area of Scalable Machine Learning, and the course will be co-taught at UCSC and UT Austin. The project will recruit women and minority students.

View original record on NSF Award Search →