GGrantIndex
← Search

Stampede 2: Operations and Maintenance for the Next Generation of Petascale Computing

$36,793,220FY2017CSENSF

University Of Texas At Austin, Austin TX

Investigators

Abstract

In 2016, the National Science Foundation funded the acquisition of a large new forward-looking high performance computing (HPC) system, Stampede 2, by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. In partnership with its lead system vendor (Dell), TACC will deploy the Intel-based system in 2017, doubling the capacity of its predecessor, Stampede, by introducing new memory, processor and interconnect technologies. As Stampede 2 nears its operational deployment, a proposal for operations and maintenance (O&M) of the system was submitted by the University of Texas at Austin. The system is expected to be used as a national resource by thousands of researchers, educators, and students annually. As a critical component of academic infrastructure, it will advance fundamental knowledge in a wide variety of science and engineering frontiers. In addition to continued partnership with Dell, subawards to Clemson University, The University of Colorado, Cornell University, Indiana University, and Ohio State University will ensure a broad national research of innovative HPC to academia and industry. Stampede 2 will operate within the larger landscape of the nation's research cyberinfrastructure (CI). It joins the set of large scale computing resources that rely on and benefit from the collaborative user services model of the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE) project. These accompanying shared services provide for systems allocations, user training, technical interoperability, research and CI community engagement, and access to expertise. Stampede 2 doubles the computing, storage and networking capacity of the current system, Stampede. Delivering on the potential of this complex scientific instrument requires knowledgeable and ongoing operations, which include: robust system maintenance, reliability and availability; security; software configuration and management; efficient utilization; and research workflow optimization. Most significantly, the thousands of users who currently depend on Stampede rely on expert assistance to help in the development of new skills in order to maximize the value of the new technologies in Stampede 2. These technologies represent the future of large-scale computing. The architecture of Stampede 2 reflects community consensus about HPC's exascale future; while specific technologies are in rapid flux, all paths indicate a transition to more explicit parallelism within applications. Today's applications must adapt, and Stampede 2 offers a bridge to exascale systems of tomorrow, providing capabilities for exploring new approaches to multiscale (both temporal and spatial) simulations, many forms of data intensive science, visualization, and data analysis. Stampede 2's operations will also broaden the usage base of HPC, appealing to and supporting a much greater depth and breadth of large-scale computational science for research than any other national system. The Stampede 2 Operations and Maintenance project plan includes world-class operations, user support and training, application tuning and migration, education, outreach, documentation, data management, visualization, analytics-driven application support, and research collaboration. TACC and its team of partners are established CI providers. Collectively the Stampede 2 operations team will leverage a variety of other NSF-supported projects such as XSEDE, Advanced Cyberinfrastructure Research and Education Facilitators (ACI-REF), and a broad array of scientific software activities. With these complementary collaborations, the value of the O&M award is further increased.

View original record on NSF Award Search →