Spatio-Temporal Memory Streaming
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
Project Abstract for 0702658: Technological advancements in semiconductor fabrication have led to an abundance of on-chip transistors, faster clock speeds, and unprecedented processor performance. In contrast, while DRAM capacity has increased commensurately, DRAM speeds have primarily lagged behind resulting in an ever-increasing processor/memory performance gap. Conventional approaches to bridge the speed gap with a hierarchy of ''''cache'''' memories--where at every level cache size is traded off for speed--has reached diminishing returns. Cache hierarchies have become increasingly ineffective in hiding the memory latency for important classes of commercial and scientific workloads. For example, in modern servers (e.g., transaction processing or web servers) the processors idle more than 50% of the time waiting for memory. Processor-centric proposals to bridge the gap (e.g., large-window or run-ahead execution) rely on high inherent memory-level parallelism in applications, which is unfortunately absent in many such workloads due to the dependent nature of memory accesses (e.g., pointer chasing in linked data structures). In this project, a novel memory system architecture, called Spatio-Temporal Memory Streaming (STeMS), is developed in which memory moves in correlated groups (called spatio-temporal streams) rather than as individual cache blocks to enhance fetch lookahead and memory-level parallelism, hide memory latency, and improve on-chip storage utilization and pin bandwidth. STeMS capitalizes on the observation that memory access patterns, while arbitrarily irregular, are highly repetitive due to iterative program control flow and infrequent structural changes to data in memory. To enhance memory-level parallelism, STeMS is designed to extract repetitive temporally- and spatially-correlated ''''streams'''' of instructions and data corresponding to data structure traversals at program sites. Preliminary results indicate that a STEMS-based system can eliminate over 60% of shared cache misses in on-line transaction processing server software.
View original record on NSF Award Search →