GGrantIndex
← Search

AF: Small: Collaborative Research: Algorithmic and Computational Frontiers of MapReduce for Big Data Analysis

$114,469FY2018CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Modern science and engineering heavily relies on processing massive data sets and the size of the data requires applications to run using distributed computing frameworks. However, many existing methods essential to the applications are not easily adapted to work in distributed settings. This project aims to develop new efficient ways of processing large data sets in widely used distributed computing platforms. The project will reveal new methods for processing diverse and complex data sets of massive size and allow for various applications scale to large inputs. The work has the potential to fundamentally change algorithmic techniques used in distributed computing, helping to shape big data research, the computing industry, and the growing economy reliant on big data analysis. Research outcomes will be integrated with education by writing an extensive survey/tutorial on the core algorithmic ideas used in the new discoveries to make the ideas transparent to the algorithmic developers and practitioners. The PIs will make some of the discovered algorithmic ideas accessible even to undergraduate students, helping them get prepared to cope with algorithmic challenges in distributed computing for large data sets. Special efforts will be made to include women and minorities in advising and mentoring plans. The main goal of the project is to find new ways of unlocking the underlying power of MapReduce, a popular distributed platform, through the development of new algorithmics. The developed algorithms should have provably strong guarantees and demonstrate the effectiveness via empirical experiments. Considering the increasing demand for large data analysis, establishing a solid theoretical MapReduce model and developing new algorithmic ideas will have the potential to establish faster and memory efficient algorithms for distributed computing. The PIs will consider a collection of carefully chosen problems to understand in the MapReduce setting that not only have strong connections to theoretical work but also have the potential for high impact in real world Big Data applications: Clustering, Distributed Dynamic Programming, and Limitations of MapReduce. This will be done in parallel with the attempt to better understand the currently accepted MapReduce models that have been developed and to perhaps further refine them to better connect models with practice. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →