CAREER: Learning of graph diffusion and transport from high dimensional data with low-dimensional structures

$238,421FY2023MPSNSF

Duke University, Durham NC

Investigators

Abstract

Graph-based methods are pivotal tools in big data analysis due to their powerful ability to model data in various fields of science and industry. For high-dimensional data, an affinity graph can be constructed from the data cloud and the graph geometry will recover the implicit low-dimensional structure of the data. Therefore, a graph-based approach has the potential to overcome the curse of dimensionality and provide distribution-free methods for predictive and generative learning tasks. The overarching goal of this project is to develop a theoretical and computational framework for graph-based data analysis that overcomes the curse of dimensionality of high dimensional data by leveraging the underlying low-dimensional geometric structure in the data. The mathematical results can be applied to data visualization and dimension reduction, generative models, general unsupervised learning, and a wide range of real applications, ranging from single-cell sequencing to sensor networks. The project will provide research opportunities and projects that are suitable for graduate and undergraduate students, and results of the project will produce pedagogical materials to be incorporated into data science courses at the undergraduate and graduate levels. The project aims to develop theoretical and computational tools for efficient and accurate graph-based analysis of high-dimensional data that captures the intrinsically low-dimensional, non-linear structures in the data. The research work consists of four integrated topics: (1) learning of graph diffusion with a theoretical guarantee, (2) robust graph affinity for graph-based data analysis, (3) graph-based learning of intrinsic optimal transport in high dimension, and (4) generative model of graph data by gradient flow. Using tools from applied harmonic analysis and high dimensional probability, the project will address several open questions in the field. On the theoretical side, the project will model the implicit low-dimensional structure as data lying on or near hidden manifolds embedded in the high-dimensional space and analyze the convergence of the graph operators in the limit of large samples. On the practical side, the project will develop algorithms with sampling and computational complexities only depending on the intrinsic data dimensionality. The mathematical findings will provide computational tools to analyze data in real world applications, including biomedical and network data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →