Collaborative Research: PPoSS: LARGE: A Full-Stack Architecture for Sparse Computation

$2,250,000FY2022CSENSF

Massachusetts Institute Of Technology, Cambridge MA

Investigators

Daniel Sanchez Martincontact Joel S Emer Manya Ghobadi Mengjia Yan

Abstract

Computer systems have been designed and optimized primarily for dense computations, i.e., those that process regularly structured data. But current systems are ill-suited to sparse computations, i.e., those that process unstructured data. Sparse computations are very common because many relations and interactions are sparse. For example, most people are not friends and most neurons are not directly connected. Sparse computations take advantage of this sparsity by encoding and processing only meaningful relations, such as storing only the non-zero elements of a matrix. These applications are crucial in many domains, like deep learning, data analytics, and scientific computing, but their irregular structure makes them inefficient and hard to scale in current systems, wasting billions of dollars yearly. This project aims to redesign the computing stack to provide first-class support for sparse computations. The project's novelties include a full system stack that spans programming languages, compilers, and specialized hardware architectures and large-scale computer systems. The project's impacts include making future parallel systems much more versatile, scalable, energy efficient and easier to program. This project takes a coordinated approach across the system stack to unlock the performance and scalability of sparse computations, because they pose challenges that cannot be addressed at a single layer. For example, sparse computations have a rich space of choices in algorithm, data representation, and schedule, which current languages and compilers cannot capture or optimize properly. The right choice of algorithm and data representation are often unknown in advance and may change at run-time, thwarting the rigid division between current compilers and schedulers. Irregular, data-dependent control and memory accesses stymie compiler analysis, hinder parallelization, make poor use of hardware, and introduce numerous side channels that thwart security. Finally, their data-intensive nature is a poor match to the processors and accelerators pervasive in current clusters and datacenters, which optimize for compute operations rather than to minimize data movement. To tackle these challenges, this project will develop a full system stack spanning domain-specific languages, a tightly integrated compiler and scheduler, and specialized hardware architectures and high-performance, multi-node computer systems and networks. This stack is built around a unifying abstraction, a novel sparse intermediate representation that (1) encodes semantic information on key sparse data structures and their iterations, (2) enables optimizing compiler transformations and dynamic scheduling decisions, and (3) can be easily compiled to parallel architectures, including graphics processing units (GPUs), general-purpose processors, our proposed specialized architecture, and their combination. The full stack will be designed with security at the forefront, leveraging novel cross-layer techniques to achieve secure high performance. This system will be rigorously evaluated using a broad set of sparse applications and at a wide range of system scales, including large-scale clusters with hundreds of GPUs or tens of specialized processors. By innovating across the full software and hardware stack, these techniques will achieve performance, scalability, and efficiency gains that single-layer approaches cannot provide. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →