CAREER: A Unified Compiler for Sparse Array Operations and Relational Algebra
Stanford University, Stanford CA
Investigators
Abstract
Society is increasingly dependent on scalable data analytics to solve societal and industrial problems. However the data being analyzed and the computers being used are both becoming so diverse and irregular that it is much harder to deliver efficient software. To drive innovation in both data analytics and hardware development, computer systems are needed that take advantage of the structure and sparsity of the data and the capabilities of new hardware. However, taking advantage of these properties and capabilities with current approaches makes the software a lot more complicated. The project's novelties are the development of a software system that automatically specializes itself to combine diverse and irregular data sources and to target diverse hardware. The project's impacts are a 10x-100x performance improvement across a large class of irregular data analytics computations, including sparse tensor algebra, sparse array operations, and relational algebra. Even greater performance improvements will be possible by taking advantage of emerging specialized hardware. The project explores new programming language constructs and a new unified compiler system for sparse tensor algebra, sparse array operations (e.g., sparse NumPy), and relational algebra. The unified compiler combines three separate languages that describe respectively the application logic (e.g., a relational query followed by sparse array operations), the data representations (e.g., relational tries and sparse arrays stored in compressed form), and the organization of the computation (e.g., to target a GPU equipped with specialized tensor processing hardware). To achieve these goals, the project will develop a unified compiler theory for sparse array operations and relational algebra, develop new data representations for relations, and develop techniques to compile these applications to specialized accelerators. Success of this project will lead to new flexible data analytics systems that port across both data structures and hardware and that are orders of magnitude faster than current general approaches without loss of generality. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →