CAREER: Compiler and Runtime Support for Multi-Tasking on Commodity GPUs

$517,545FY2018CSENSF

Colorado School Of Mines, Golden CO

Investigators

Abstract

General-purpose Graphics Processing Units (GPU) computing has become mainstream, as witnessed in various domains such as machine learning, graph analytics, and scientific simulation. One notable trend is employing GPUs in data centers and cloud computing infrastructures to satisfy users' increasing demand to accelerate their applications. In such multi-tasking environments, applications from different users contend to use the shared GPU, leading to unpredictable and unacceptable performance degradation. This CAREER project aims at developing a set of compiler and runtime techniques to support multi-tasking on commodity GPUs in a transparent and efficient manner. The compiler techniques circumvent the hardware limitations to enable a set of features, such as preemption, and the runtime system schedules applications to utilize the potential of the GPU and guarantees quality of service. In addition, the investigator advances GPU education in the University to target both Computer Science (CS) and non-CS students based on a GPU education center. Specifically, the project investigates how to integrate compiler and runtime techniques to support multi-tasking on GPUs by building a system that achieves three goals. First, the system addresses GPU core contention by enabling flexible GPU kernel preemption. The compiler transforms the GPU program to be a preemptable form by circumventing the limitation imposed by the hardware thread scheduler. The runtime intercepts all GPU kernel launch requests and makes global preemption and scheduling decisions to maximize performance. Second, the system supports fine-grained sharing for threads from different applications to fully utilize hardware resources within GPU streaming multi-processors. The runtime guarantees the QoS of user-facing applications while optimizing overall throughput aided by performance prediction. Third, the system addresses GPU memory contention by coordinating GPU memory transfers, which considers memory access patterns and array reuse patterns. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →