Collaborative Research SI2 SSE: Pipeline Framework for Ensemble Runs on Clouds

$199,725FY2012CSENSF

University Of Miami, Coral Gables FL

Investigators

Abstract

Cloud computing is an attractive computational resource for e-Science because of the ease with which cores can be accessed on demand, and because the virtual machine implementation that underlies cloud computing reduces the cost of porting a numeric or analysis code to a new platform. It is difficult to use cloud computing resources for large-scale, high throughput ensemble jobs however. Additionally, the computationally oriented researcher is increasingly encouraged to make data sets available to the broader community. For the latter to be achieved, using capture tools during experimentation to harvest metadata and provenance reduces the manual burden of marking up results. Better automatic capture of metadata and provenance is the only means by which sharing of scientific data can scale to meet the burgeoning explosion of data. This project develops a pipeline framework for running ensemble simulations on the cloud; the framework has two key components: ensemble deployment and metadata harvest. Regarding the former, on commercial cloud platforms typically a much smaller number of jobs than desired can be started at any one time. An ensemble run will need to be pipelined to a cloud resource, that is, executed in well-controlled batches over a period of time. We will use platform features of Azure, and employ machine learning techniques to continuously refine the pipeline submission strategy and workflow strategies for ensemble parameter specification, pipelined deployment, and metadata capture. Regarding the latter key component, we expect to reduce the burden of sharing scientific datasets resulting from the use of cloud resources through automatic metadata and provenance capture and representation that aligns the metadata with emerging best practices in data sharing and discovery. Ensemble simulations result in complex data sets, whose reuse could be increased by expressive, granule and collection level metadata, including the lineage of the resulting products, to contribute towards trust. In this project we focus on a compelling and timely application from climate research: One of the more immediate and dangerous impacts of climate change could be a change in the strength of storms that form over the oceans. In addition, as sea level rises due to global warming and melting of the polar ice caps, coastal communities will become increasingly vulnerable to storm surge. There have already been indications that even modest changes in ocean surface temperature can have a disproportionate effect on hurricane strength and the damage inflicted by these storms. In an effort to understand these impacts, modelers turn to predictions generated by hydrodynamic coastal ocean models such as the Sea, Lake and Overland Surges from Hurricanes (SLOSH) model. The proposed research advances the knowledge and understanding of probabilistic storm surge products by enhancements to the SLOSH model itself and through mechanisms that take advantage of commercial cloud resources. This knowledge is expected to have application in research, the classroom, and in operational settings. The broader significance of the project is several-fold. Cloud computing is an important economic driver but it remains difficult for use in computationally driven scientific research. This project lowers the barriers to conducting e-Science research that utilizes cloud resources, specifically Azure. It will contribute tools to help researchers share, preserve, and publicize the scientific data sets that result from their research. Because we focus on and improve an application that predicts storm surge in response to sea level changes and severe storms, our work contributes to societal responses and adaptations to climate change, including planning and building the sustainable, hazard-resilient coastal communities of the future.

View original record on NSF Award Search →