CSR: Small: Cache-Coherent Accelerators for Efficient Persistent Memory Programming

$592,937FY2023CSENSF

University Of Utah, Salt Lake City UT

Investigators

Abstract

Persistent memory (PM) is a new class of computer storage that upends the model that computer systems have used for more than half a century. Unlike conventional storage devices, PM can be accessed by CPUs as if it were memory. Accessing storage this way can be implemented in hardware instead of software, so it can be accessed more quickly and efficiently than conventional storage devices. Even so, it persists across faults and power failures. This proposal seeks to convert existing software to use this faster PM-based storage without requiring programmers to change their code while providing safety during crashes and power failures. Unlike existing approaches, the proposed approach does this using emerging commercially available hardware; hence, it is fast and efficient since it does not reintroduce software to the CPU storage access path. The improved computer system memory performance, efficiency, and capacity that PM provides when combined with this project's accelerated crash consistency will have broad benefits to many private and public sector applications. Many of the costs that virtually all database systems introduce to survive crashes and power failures can be mitigated by the proposed approach. Furthermore, applications that process and modify massive data sets in real-time including data center applications, social networks, and machine learning training and inference over changing data sets can all benefit from improved scale, performance, and efficiency. Hence, this work can help accelerate code in the data center applications that are used by billions of users daily. This project will also carry out several outreach and educational activities along with specific collaborations to encourage industry adoption including a tutorial, a new graduate seminar, new modules for graduate and undergraduate courses, and a new course lab assignment. Additionally, the PI will host two incoming undergraduate students from an underrepresented group for research rotations as well as host two undergraduates through the NSF REU program. The resulting tools, framework, and accelerator code will be developed in the open under a permissive license to support use and development both in academia and industry, and it will be packaged for easy use and deployment on open platforms. In more detail, PM support in recent CPU architectures allows CPUs to access and manipulate massive data sets, lowering data access times from 10s of microseconds to 100s of nanoseconds. However, system crashes in the middle of modifying persistent data structures can lead to inconsistencies that are difficult or impossible to repair; thus, today PM-based data structures still place software on the path to storage access to provide extra steps for crash consistency. The key insight of this proposal is that the interposition needed on PM data accesses for crash consistency can be done fully in hardware without any changes to existing CPU architectures by using newly emerging cache-coherent accelerators and field-programmable gate arrays (FPGAs). Furthermore, it can be done with existing, off-the-shelf code for data structures that were designed without PM in mind. In the proposed approach, applications interact with PM through a hardware FPGA, which carefully controls how changes are propagated to PM to provide crash consistency. Since this interposition is in hardware it is efficient, which helps realize the full performance potential of PM's direct load/store interface. Also, this new approach works well with CPUs' cache coherence protocols, so CPUs can cache PM data more aggressively than is safe with direct PM access; in turn, this makes the proposed approach faster than direct load/store access to PM in many cases. Finally, the proposed work includes using this cache-coherent accelerator to provide replicated, fault-tolerant PM, and it includes new approaches to hiding PM and remote memory access times by implementing new, intelligent prefetching policies in hardware without CPU changes. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →