GGrantIndex
← Search

Collaborative Research: Randomized Algorithms in Linear Algebra and Numerical Evaluations on Massive Datasets

$220,413FY2010MPSNSF

Rensselaer Polytechnic Institute, Troy NY

Investigators

Abstract

Data matrices have structural properties that present challenges and opportunities for both the Numerical Linear Algebra (NLA) community and the Theory of Algorithms (ToA) community. Matrix factorizations, such as the eigendecomposition, the rank-revealing QR factorization, and the Singular Value Decomposition, have been widely used for information retrieval. Historically, matrix factorizations have been of central interest in NLA since one can use them to express a problem in such a way that it can be solved more easily. ToA, on the other hand, has recently addressed the computation of such decompositions from a sampling perspective. The two approaches are complementary. However, thus far, the two communities have not worked closely together to integrate them. More often than not, each community is only cursorily aware of developments in the other community. Defining significant research directions that both communities can work on, and applying the resulting linear algebraic algorithms to data analysis problems (among others) will lead to important breakthroughs. The main objective for this proposal is to bridge the existing gap and bring together NLA and ToA researchers to promote cross-fertilization of ideas that could have immediate and long-term impact on data analysis. Towards that end, the PIs will work on set of prototypical research problems that can significantly benefit from ideas and research in both NLA and ToA. These problems range from approximating the singular values and vectors of a matrix by element-wise sampling to random-projection-based algorithms for least-squares problems and the design of randomized algorithms for the widely used non-negative matrix factorization. This proposal seeks to explore the complementary perspectives that the Numerical Linear Algebra and the Theory of Algorithms (ToA) communities bring to linear algebra and matrix computations. This is a timely quest, motivated by technological developments over the last two decades that permit the automatic generation of large datasets. Such datasets are often modeled as matrices. The proposed work will serve as a demonstration project on the fruitfulness of collaboration between the NLA and the ToA communities on problems that are of common interest. It is expected that the proposed research will demonstrate commonality in the two approaches, as well as highlight the advantages of the dual perspective. Through outreach activities, the PIs hope to motivate even more researchers to undertake similar investigations on related topics. The proposed algorithms will be numerically evaluated on a suite of matrices from application domains that the PIs have been working on over the past few years, in order to understand better their properties and to demonstrate their potential in dealing with the modern, massive datasets. More specifically, the PIs will test the proposed strategies on population genetics data in order to infer ancestry of individuals, as well as gene expression data in order to investigate hypotheses that correlate genes and diseases. As such, we expect that the developed algorithms will impact the areas of linear algebra, randomized algorithms, information retrieval and data mining, as well as bioinformatics. Finally, in order to disseminate the proposed research, the PIs intend to organize workshops (following the example of the Workshops on Algorithms for Modern Massive Datasets in 2006, 2008, and 2010; the PIs were co-organizers of these workshops) and working group meetings, and will disseminate their research via blogs and articles intended for broader audiences.

View original record on NSF Award Search →