GGrantIndex
← Search

EAGER: Workload Analysis of NSF Innovative HPC Program Resources

$252,451FY2017CSENSF

Suny At Buffalo, Amherst NY

Investigators

Abstract

The NSF Innovative High Performance Computing (HPC) program provides critical computational and data analytics capabilities to thousands of computational scientists conducting leading-edge science and engineering research across the U.S. The portfolio of systems supported by the program is intended to be technically diverse, reflecting changing and growing use of computation in both the research and education process. In addition, the program complements and extends the capabilities provided by campus and other regional research cyberinfrastructures. Given the important and unique role that the systems in the program play in the Nation's research portfolio, it is important to have a detailed understanding of its past and current workload. This project seeks to perform a workload trend analysis for all the systems in the NSF Innovative HPC program. Workload characterization is important because it is an integral part of optimal performance tuning for HPC Systems. In addition, examining all the systems within the Innovative HPC program collectively will not only aid holistic capacity planning for the wider HPC ecosystem, but also, reveal how the nature of computational science research itself is evolving over time. The workload analysis will address fundamental usage and performance questions such as: How much of the HPC program resources are consumed by high throughput applications (large numbers of loosely-coupled serial, single and small node count jobs) and gateway applications, and is this changing over time? What are the characteristics of gateway jobs? How much of the resources are used for data analytics/data intensive computing? What is the run-time over-subscription capacity for the entire ecosystem? Are there differences in the job mixes among the systems and if so, how does this impact job throughput? The study will leverage XDMoD (XD Metrics on Demand) which contains a data warehouse of detailed job level accounting and performance data for all the resources provided through the NSF Innovative HPC program. Moreover, the results of this study will not only provide detailed operational and performance analytics for the broad community of users and maintainers of the systems in the program, but will also be used as a template for similar studies carried out on other advanced HPC systems. This transfer of knowledge will be facilitated through the use of Open XDMoD, which is already in wide use by HPC centers worldwide, as well as presentations at conferences and meetings. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →