CPA-CPL: Exploring and Exploiting Heterogeneous Cache Sharing in Chip Multiprocessors Systems for Locality Optimization and Proactive Cache Management

$290,000FY2008CSENSF

College Of William And Mary, Williamsburg VA

Investigators

Abstract

The increasing problems of power, heat dissipation, and design complexity have caused a shift in processor technology to favor multicore multiprocessors. Along with that shift, the sharing of memory hierarchy becomes deeper, heterogeneous and more complex, causing cache contention, increased conflicts, and also, synergy sharing. Without understanding the implications of this change, current multicore systems suffer from considerable performance degradation, poor performance isolation and inferior fairness guarantees. The urgency of these issues increases as the degree of processor-level parallelism increments rapidly. Prior studies, mostly in areas of architecture and operating systems, rely on simple heuristics to estimate cache requirement of corunning programs; the inaccuracy and overhead limits their scalability and effectiveness. This work tackles these challenges uniquely from the compiler aspect by constructing predictive behavior models for corunning processes, developing cache-sharing-aware program transformations and loop scheduling, and combining the program-level knowledge of programming systems with the proactive resource management by runtime systems. Specifically, this work proposes inclusive reuse signatures to characterize inclusive locality---the memory behavior of corunning programs on shared caches, and inter-thread affinity models to capture data locality among parallel threads. It tackles the challenges facing the measurement, prediction and exploitation of inclusive locality. The analysis opens new opportunities for shared-cache optimizations by both compilers and runtime systems. The PI develops a series of program transformations, such as inter-thread memory reorganization and cache-sharing aware loop scheduling, to increase inter-thread spacial locality and ameliorate conflicts, contention and false sharing. For runtime systems, this work invents proactive cache management which partitions caches or schedules processes according to predicted inclusive locality proactively, overcoming the limitations of current reactive schemes on scalability, accuracy and effectiveness.

View original record on NSF Award Search →