Architecture Support for Advancing PGAS (ASAP)

$229,983FY2015CSENSF

George Washington University, Washington DC

Investigators

Abstract

Architectures of parallel computers and modern manycore processor chips, that contain tens and eventually hundreds of processors, are becoming quite complex for the programmers. Efficient programming of those systems is very important to achieve high-speed of processing while reducing power. To do so, programmers must ensure that: 1. the work in the program is broken into as many parallel activities as possible for fast processing; 2. data is located close to the processing cores that will manipulate them to avoid making the data travel long in the system thereby losing more time and wasting energy. Programming models are abstractions that provide the programmer with an easy-to-use logical view that hides the complexity of the underlying systems, while facilitating efficient programming. These are two conflicting requirements and while the current de facto programming methods can offer efficiency in the majority of the cases, they are not easy to use. The so called, PGAS or the Partitioned Global Address Space programming model has the promise of striking a balance between efficiency and ease-of-use. However, help is needed from the hardware particularly in simplifying and speeding up the process of finding where the data to be processed is physically located. This work is to investigate hardware solutions for this problem under the PGAS programming model. The outcome will improve productivity of domain scientists, thereby reducing the time from conceiving an application problem till the solution is attained, which in the long run can mean more rapid discoveries and innovations, as well reduction in the cost of developing the next generation software. The PIs propose to investigate a general hardware support for address translation for the PGAS programming model. PGAS strikes a balance between the locality-aware, but explicit, message-passing model (e.g. MPI) and the easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP). However, the PGAS rich memory model comes at a performance cost which can hinder its potential for scalability and performance. Current implementations can be orders of magnitude slower in accessing local shared space as compared to accessing their private space. Compiler optimizations only handle special cases and hand-tuning renders the PGAS ease-of-use advantage worthless. The proposed hardware solution can facilitate high-performance execution for out-of-the-box (i.e. non-hand-tuned) PGAS applications. The PIs are creating PAGS memory model translation architectural support, which can navigate the PGAS memory model converting PGAS shared references to system's virtual addresses efficiently on-the-fly. This eliminates the need for hand-tuning, while maintaining the performance and productivity of PGAS languages. The hardware support will be available to the compiler through instruction set extensions. A tool set integrating and adapting existing micro-architecture simulators with compiler and a run-time system will be used as the main testbed and distributed over a cluster for extensive experimentation. At the end of the project the PIs expect to release tools and the benchmarks utilized under this project.

View original record on NSF Award Search →