CAREER : Towards Exascale Performance of Parallel Applications

$324,734FY2024CSENSF

University Of New Mexico, Albuquerque NM

Investigators

Abstract

Each generation of supercomputers is more powerful than the previous, with current systems capable of adding millions of millions of numbers together in a single second. These high-performance machines provide the hardware necessary for scientists and engineers to solve increasingly complex problems and make discoveries through computer simulations. These programs run across the thousands of individual compute cores that make up a supercomputer, with each core executing a portion of the program and communicating messages to other cores as needed. Often, simulations fail to efficiently use the computing power provided by current supercomputers due to significant overheads associated with communication. This project addresses this challenge by reducing communication costs within widely used parallel programs and improving a spectrum of existing applications to allow for novel scientific discoveries. Furthermore, this project will support the revitalization of the CS4ALL course along with hackathon development to introduce computing topics to a diverse student population across the University of New Mexico main and branch campuses. The goals of this project are to minimize communication costs and enhance the performance and scalability of existing parallel applications. This project will develop accurate performance models for emerging heterogeneous architectures to optimize communication within non-linear solvers, simulations, iterative methods, and neural networks. These models will be used to employ several optimization strategies, including graph partitions that minimize performance model-based functions, locality-aware partitioning throughout the widely used algebraic multigrid (AMG) preconditioner, and topology-aware MPI_Allreduce operations. Furthermore, the project will explore specialized optimizations for heterogeneous systems with multiple GPUs per node, including selecting optimal communication paths, utilizing all available CPU cores during communication via threading, and aggregating inter-node messages to reduce data injected into the network. By minimizing communication within foundational numerical methods, this project aims to yield tangible improvements in performance and scalability across a spectrum of applications reliant on these methods. This project is jointly funded by the Software and Hardware Foundations Core Program and the Established Program to Stimulate Competitive Research (EPSCoR). This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →