SHF: Small: Data Movement Complexity: Theory and Optimization
University Of Rochester, Rochester NY
Investigators
Abstract
As computing becomes more powerful, memory is larger and more complex, and data movement increasingly a major source of cost in both time and energy. However, the measurement problem is not well solved. Programmers have to run a program on a machine to measure data movement. After measuring, they are often left with a set of numbers with unclear relationships between them. In parallel executions, the results are also not reproducible. This research first develops an abstract measure of the memory cost called Data Movement Distance (DMD). In addition, it creates a theory of data movement complexity, which can be used for effective program and algorithmic optimization. Programming technology should ensure software portability; therefore modern software does not program data movement explicitly. With the new theory, the cost of data movement is precisely quantified, and the greater precision enables new software optimization to more effectively reduce this cost. Since the new measure is abstract and not machine-specific, the new optimization is machine-agnostic and therefore portable. In the last part, the research develops multiple optimization techniques based on the new theory. By focusing on reducing the cost of data movement, this research targets the cause of the most energy consumption on modern computers, data centers, and computing infrastructure in general. Reducing the energy consumed by computing is urgently needed in response to accelerating climate change. This research develops an abstract measure of the memory cost called Data Movement Distance (DMD). Memory complexity is measured by DMD in the same way time complexity is by the operation count. The research has three parts. The first is DMD complexity analysis, which is both symbolic and asymptotic. DMD is complexity without Big-O. When comparing algorithms, DMD complexity discerns precise constant-factor differences between DMDs. The second is parallel locality analysis for use by loop parallelization. Cache performance has long been a problem both fundamental in understanding the limit of parallel computing and important in practice. The new analysis is used for auto-scaling of parallel code. While auto-parallelization saves the programming effort in creating portable parallel code, auto-scaling saves the testing and tuning time in obtaining portable parallel performance. Finally, two other program-optimization techniques are developed: safe structure splitting in Rust and program symbiosis to improve performance in the shared cache. When used together, these techniques let algorithm and program design target abstract data movement at different levels, all measured by DMD, and hence enable machine-agnostic joint optimization across software layers. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →