GGrantIndex
← Search

CAREER: Data-Intensive HPC Analytics: A Systems Approach Through Extended Interfaces, Data Restructuring and Data-centric Scheduling

$454,909FY2010CSENSF

The University Of Central Florida Board Of Trustees, Orlando FL

Investigators

Abstract

With the advent of emerging e-Science applications, today's scientific research increasingly relies on petascale-and-beyond computing over large data sets with petabyte-and-beyond sizes. Representatives include analytics- and simulation- driven applications such as the human vision simulation, astrophysics data analysis, earthquake modeling, climate modeling using ensemble runs, etc. In many of the above-mentioned fields, scientists are dealing with large amounts of data and analyzing them to explore new concepts and ideas. These applications make up data-intensive HPC analytics, which lies at the intersection of current HPC and Data-Intensive Scalable Computing (DISC). When HPC systems use traditional configurations to support data-intensive HPC analytics, data is copied from a large remote storage system to diskless compute nodes for processing. Copying data back and forth is an expensive and time consuming process. These data-intensive applications do not require compute intensive resources, but rather moderate compute power machines with the capability of local storage so that data can be processed in-place. One such example of this configuration is the Hadoop framework. However, there are currently limitations in this framework which must be overcome in order to make Hadoop an effective HPC tool. The investigator is leveraging the Hadoop framework to process large amount of patterned data in HPC. This research program includes three thrusts. It is developing the MapReduce API to support a wider range of I/O access patterns, various data restructuring schemes to improve I/O performance for these access patterns, and an efficient scheduling scheme considering multiple chunk locations and data transfer latencies over the network. The research is integrated into several educational activities, such as the development of data-intensive HPC curricula.

View original record on NSF Award Search →