OAC: Small: Data Locality Optimization for Sparse Matrix/Tensor Computations

$499,391FY2020CSENSF

University Of Utah, Salt Lake City UT

Investigators

Abstract

The cost of data movement vastly exceeds the cost of execution of arithmetic operations on current computers and the imbalance is only expected to get worse. Hence the minimization of data movement in the implementation of algorithms is critical. Tiling is a well known technique for data-locality optimization and is widely used in compilers as well as high-performance numerical libraries for dense matrix/tensor computations. However, data-locality optimization for sparse computations is a significant challenge, in large part because the data access patterns are not known a priori. This project proposes a plan of research to systematically explore a number of issues pertaining to data-locality optimization for sparse matrix/tensor computations. The project identifies an important subclass of sparse computations used in machine learning and data analytics, and proposes tools and techniques to enable high-performance parallel implementations on multicore CPUs and GPUs. The broader impact of the project will be the enhancement of programmer productivity and the enabling of software portability and high performance for applications in data analytics and machine learning. The challenge of data-locality optimization for the data-dependent and irregular access patterns that occur with sparse matrix/tensor computations will be addressed through research along multiple directions: 1) Compact signatures for sparse matrices: the strong relationship between the data access patterns for key sparse matrix primitives of use in machine learning and data analytics drives the development of one-dimensional signature vectors that capture the essential characteristics of the two-dimensional sparsity pattern as it pertains to needed data movement in a memory hierarchy; 2) Sparse tiling: Sparse matrix signature vectors will serve as a basis for dynamic decisions based on target platform characteristics, for tile size selection and scheduling of tiles for load-balanced execution; 3) Matrix renumbering/reordering: The impact of row/column reordering on the performance of sparse matrix primitives will be investigated, and new reordering schemes will be devised to enhance data-locality for key sparse matrix/tensor primitives; 4) Sparse microkernels: Microkernels will be developed and optimized for CPUs/GPUs, and used as the lowest-level building blocks that execute the innermost tiles in the tiled execution of sparse matrix/tensor computations; 5) Architecture-aware performance prediction: Models will be developed that combine analysis of predicted data-movement volume in combination with machine learning using algorithmic and architectural features. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →