CAREER: Leveraging Randomization and Structure in Computational Linear Algebra for Data Science

$513,309FY2024CSENSF

Regents Of The University Of Michigan - Ann Arbor, Ann Arbor MI

Investigators

Abstract

Data science plays a central role in addressing societal challenges, such as healthcare, climate change, and urban planning. At the core of nearly all developments in algorithms for data science is computational linear algebra, an area that concerns the study of algorithms for solving ubiquitous problems involving matrices and other linear-algebraic objects that are used to represent data. With ever-increasing data sizes, randomization has become a key technique for developing efficient algorithms in computational linear algebra. Yet, there is a significant gap between the theory and practice of these algorithms, which has slowed their practical adoption in data science applications. This project identifies key challenges and puts forward new directions towards providing the algorithmic foundations necessary to ensure that a broad scope of randomized linear algebra algorithms are successfully deployed across computational data science over the next decade. This project leverages fundamental interdisciplinary ideas at the intersection of theoretical computer science, machine learning, statistics, and nonlinear optimization. In addition to developing the theoretical foundations, one of the key aims driving the project is to facilitate ongoing implementation efforts aimed at incorporating randomization into LAPACK, the default computational linear algebra software package in machine learning, engineering, statistics, and scientific computing for the past thirty years. At the core of the project is an integrated education plan focused on helping students to gain an interdisciplinary skillset at the intersection of algorithmic foundations and data science. The project also involves outreach to students from three underresourced high schools in Michigan through a collaboration with the university's Engineering Pathways program. The project’s objectives are to close the theory-practice gap in using randomization to design improved algorithms for ubiquitous matrix problems such as matrix multiplication, solving linear systems, and low-rank approximation. The project identifies three major thrusts, namely (1) reformulating optimal matrix sketching via black-box sampling methods; (2) randomized iterative refinement algorithms via stochastic optimization; (3) a study of robustness of randomized numerical linear algebra algorithms to preserve certain structural elements of data. The matrix sketch, i.e., a small randomized approximation of the input data is a key foundational component of these algorithms. The project aims to develop new algorithmic and theoretical approaches towards ensuring the control and reliability of the output produced by matrix sketching and sub-sampling, which is especially challenging when dealing with randomization and will be critical for successful software integration. Building on these tools, the project pursues new approaches for designing high-precision algorithms solving linear systems and quadratic problems, by exploring techniques that lie in the unexplored regime between deterministic iterative solvers and stochastic optimization. Finally, the project aims to contribute to a unified understanding of randomized matrix approximation algorithms that preserve the structure of the data, which is essential for feature selection, experimental design, interpretability, and more. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →