CRII: SHF: Improving Programmability of GPGPU/NVRAM Integrated Systems with Holistic Architectural Support

$175,000FY2017CSENSF

University Of Southern California, Los Angeles CA

Investigators

Abstract

In the era of big data, the industry faces growing demand for higher computing power and large-capacity high performance storage. GPGPU and NVRAM are two prominent technologies that will play the key role in the "Big Data revolution". This project, which holistically improves the programmability of GPGPU/NVRAM integrated systems, tackles the "programmability bottleneck" faced in GPGPU and NVRAM. It will make it easier to develop correct applications in GPGPU and NVRAM with high performance. As a result, the project will enforce the desire of applying GPGPUs and NVRAM into a wide-range of HPC and big data applications which could then gain hundreds times speedup while ensuring recoverability. Overall, the outcomes of this project will help ensure the sustainable performance to support the supercomputing/big data processing in science and engineering (e.g. finance, medical, biology, petroleum, aerospace, and geology). This project will also contribute to society through engaging high-school and undergraduate students from minority-serving institutions into research, attracting women and under-represented groups into graduate education, expanding the computer engineering curriculum with GPGPU/NVRAM architectures, disseminating research infrastructure for education and training, and collaborating with the industry. This research investigates synergetic approaches and techniques to holistically improve the programmability of GPGPU/NVRAM integrated systems with the following techniques: (1) Timestamp-Based GPU Coherence Protocol. It avoids storage overhead by not storing sharing states (e.g. Shared, Modified, Exclusive, etc.) and the list of sharers. It reduces the traffic overhead by not sending explicit invalidation messages. (2) Integration of Persistency and the Scoped-Synchronization. This research aims to study the new notion of Persistent Scope (PS) , which incorporates the necessary persistency semantics into the existing scoped-synchronization in GPGPU programming models. Efficient architecture design that fully decouples consistency and persistency will be explored. (3) Data Sharing-Aware CTA Scheduler and Cache Management. This research plans to investigate a sharing-aware CTA scheduler that attempts to assign CTAs with data sharing to the same SM to improve temporal and spatial locality.

View original record on NSF Award Search →