SDCI HPC Improvement: High-Productivity Performance Engineering (Tools, Methods, Training) for NSF HPC Applications
University Of Oregon Eugene, Eugene OR
Investigators
Abstract
Intellectual Merit The promise of high-performance computing (HPC) will be realized by science and engineering (S&E) applications executing on scalable HPC computer systems at the high end of their performance range. Performance optimization of S&E application codes will be achieved through a process of performance engineering, where tools for parallel performance measurement, analysis, and tuning are used productively to discover sources of performance inefficiency and remove them. Parallel performance tools research and development has created powerful techniquesfor performance observation, analysis, and optimization, and produced technology solutions that are portable, interoperable,and scalable. It is now important to transfer successful, robust parallel performance infrastructure to a performance engineering framework, integrated with HPC cyberinfrastructure and directed at documented user requirements for HPC performance problem solving. In addition, if HPC resources are to be maximized, human-centric investments must also be made to help train application developers to be good performance engineers. Broader Impact This performance software foundation will be complemented by a community-driven education and training initiative to increase human productivity in performance engineering efforts across multiple S&E fields. The proposed project will also create a training program for performance technology and engineering, which will be piloted and refined at the Pittsburgh Supercomputing Center and integrated with the TeraGrid Education, Outreach,Training (EOT) program over time. This program's objectives will be to educate application developers and students in sound performance evaluation methods, to teach them best practices for engineering high-performance code solutions based on expert tuning strategies, and to train them to use the performance tools effectively. The project will develop training materials and infrastructure for distributed access, as well as institute a series of tutorials and bring your own code workshops that will be offered in-person and over the AccessGrid. In addition, application engagement will be an important component of this activity. The project will work with undergraduate and graduate students directly in performance analysis of S&E applications, and with developers of leadinglarge-scale applications to integrate performance engineering in their projects. A performance repository containing detailed characterization data for a broad set of applications and platforms will be created and made available for use across all HPC centers for performance data mining. Project success will be measured by three metrics: the improvements in application performance achieved on high-impact S&E applications, the increased performance competency of application developers across S&E domains, and the acceptance and ubiquity of the performance infrastructure among the NSF Track 1 and Track 2 centers.
View original record on NSF Award Search →