GGrantIndex
← Search

EAGER: NDN-Hadoop: Exploring Applicability of NDN for Big-Data Computing

$232,000FY2015CSENSF

University Of Arizona, Tucson AZ

Investigators

Abstract

Large-scale distributed processing of huge amount of data is the underpinning technology in the era of Big Data. The Apache Hadoop, an open-source software platform, has become the go-to technology for running multiple types of distributed applications such as Web indexing, data mining, business intelligence analysis, machine learning, scientific simulation, and bioinformatics research. This wide range of applications requires Hadoop to be flexible and adaptive to different network environments while providing high performance for each application, resulting in complicated protocol design, implementation, a large number of parameters to tune, and often unsatisfactory performance in real deployment. An emerging network architecture, Named Data Networking (NDN) shifts the focus from point-to-point communication to general content retrieval, a much better fit to Big Data computing than traditional TCP/IP, the communication protocol of the internet. NDN can potentially offer tremendous benefits in simplifying deployments, improving robustness and performance for Hadoop systems. This project investigates the feasibility of integrating NDN and Hadoop to build an NDN-Hadoop system that will provide a robust, efficient, and scalable foundation for Big Data computing. Since Hadoop is a complicated ecosystem and NDN is being actively developed, there are many questions to be answered regarding how they may work together and how to exploit the potential benefits to the greatest extent. More specifically, this project will (1) evaluate the applicability of NDN in Hadoop infrastructure by understanding its benefits and quantifying performance gains of Hadoop running on top of NDN, (2) design novel Hadoop mechanisms to take full advantage of NDN and the information it provides for better performance and reliability, and (3) develop and test a complete system of Hadoop running on top of NDN to create a platform for future research. Introducing NDN into datacenter-scale networking and computing will have profound impacts on the design of large data centers, distributed computing infrastructures, and multiple research communities. The project will produce the first data-centric Hadoop system, which will benefit all kinds of applications that use Hadoop. The results can also be translated into other Big Data computing platforms, broadening the impacts even further. It provides not only new research directions to the network and system community, but also performance enhancements to scientific computing community and data analytics. The experiences gained in this project will feedback to NDN research, contributing to the development of future Internet. The project offers great education and training opportunities for graduate and undergraduate students.

View original record on NSF Award Search →