GGrantIndex
← Search

CAREER: Fast and Scalable Combinatorial Algorithms for Data Analytics

$517,268FY2016CSENSF

Washington State University, Pullman WA

Investigators

Abstract

We are in an age when massive digital data continues to be collected at an extraordinarily rapid rate and with high and growing complexity (and concomitant uncertainty), when the data and the actors behind it are increasingly interconnected, and when architectures of computing platforms continue to rapidly change. Fast, robust and scalable algorithms that are simultaneously cognizant of all three dimensions (data, interconnection, and computing platform) are acutely needed for the purpose of analyzing massive datasets and extracting knowledge and insight from the data. This project (named FASCADA, Fast and Scalable Combinatorial Algorithms for Data Analysis) will explore the interplay between graph and matrix algorithms in order to develop methods for data analytics that perform at scale on contemporary platforms, with a primary focus on data that are expressed in terms of networks. Algorithmic research progress to be made in the project will be realized in software implementations, will be integrated with existing software tools when applicable, and will be made available to the wider community as open-source software. As part of the project's integrated education and outreach component, two new innovative courses, an undergraduate course on Data Science and a graduate course on Network Science, will be developed and taught. The educational effort will contribute to meeting the rapidly expanding need for a trained workforce in data science in the US economy and will contribute to US competitiveness in the global market. Underrepresented minority groups in computing sciences and engineering will be recruited and mentored through an existing, effective program in the Pacific Northwest (LSAMP), and undergraduate students from Heritage University will be mentored through summer internships at Washington State University. The specific research aims of FASCADA are organized under four intertwined areas. (1) Enabling Scalable Data Analytics: devise novel "problem-partitioning" methods that are useful for solving, at scale, optimization problems underlying a large class of machine learning algorithms. (2) Network Analysis: develop fast algorithms for discovering and analyzing dense subgraphs in real-world networks arising from diverse domains. (3) High Performance Computing: develop effective paradigms for the parallelization of inherently sequential graph algorithms targeting many-core architectures. (4) Algorithmic Differentiation (AD): advance AD as a technology by designing better graph-based algorithms for Hessian computation, and use AD in emerging applications, including quantifying uncertainty. A common thread that runs through all four of the areas is a focus on graph problems and their solution. The novelty of the proposed approach lies in the exploration of the bidirectional interaction between graph and matrix algorithms. Results from this effort will advance fundamental knowledge at the intersection of a range of areas, including data science, computational science and engineering, computational mathematics, and high performance computing. For further information, visit the project webpage http://www.eecs.wsu.edu/~assefaw/fascada.

View original record on NSF Award Search →