GGrantIndex
← Search

DC: Medium: Tackling and Understanding Intermediate Data in Cloud Applications as a First-Class Citizen

$600,000FY2010CSENSF

University Of Illinois At Urbana-Champaign, Urbana IL

Investigators

Abstract

Cloud computing infrastructures involve thousands of servers, petabytes of storage, and hundreds of users running various applications that involve gigabytes to terabytes of data. This project focuses on intermediate data that is generated during the execution of parallelized dataflow programs in clouds. Such cloud intermediate data brings forth several unique characteristics: they are massive-scale, distributed, subjected to computational barriers, and prolong job run-times when subjected to server failures. Further, the size of intermediate data in a cloud application is often comparable to or larger than input or output data size, and it can thus range in terabytes. Thus, in spite of extensive existing work on traditional storage problems, there is a critical need for new algorithms and systems that target cloud intermediate data. This project is the first to treat cloud intermediate data as a first-class citizen. The project will involve new algorithm design and analysis, original systems building and implementation, deployment in real world testbeds, and performance of measurement studies. Concretely, this project will build a new system that explicitly manages intermediate data in cloud dataflow programs in order to improve their fault-tolerance, and design and realize barrier relaxation strategies to improve performance of cloud programs. We will implement using open software, deploy, and experimentally evaluate our systems atop the NSF infrastructure called the Cloud Computing Testbed (CCT) that is hosted at the University of Illinois. Finally, we will perform measurement studies of workload characteristics of cloud intermediate data. A fuller understanding of intermediate data in clouds can spawn research in managing cloud infrastructures, improve run-time performance of cloud applications, and lead to new cloud programming paradigms. Our contributions will directly improve the performance and fault-tolerance of applications that are run on the community infrastructure CCT, and positively impact design and deployment of existing and emerging industry clouds. Our results will be published and released in open software and datasets.

View original record on NSF Award Search →