GGrantIndex
← Search

CAREER: Statistical Inference on Large Domains and Large Networks: Fundamental Limits and Efficient Algorithms

$571,000FY2017CSENSF

Yale University, New Haven CT

Investigators

Abstract

The emergence of data science has brought about new perspectives on age-old statistical questions. Driven by contemporary applications such as neuroscience, functional genomics and social networks, an emerging research thread in high-dimensional statistics deals with new problems of a combinatorial nature, most importantly, inference on large domains and large graphs, such as entropy estimation on large alphabets, detecting community structures in networks, all of which rely on exploiting the intrinsic or extrinsic low-dimensionality of the problem. On the other hand, an important element absent from the classical statistical paradigm is the computational complexity of inference procedures, which is becoming increasingly relevant dealing with large-scale noisy datasets. The PI has detailed plans for mentoring both undergraduate and graduate students, targeting specifically at members of the under-represented communities. Combining both statistical and computational perspectives, this research develops an interdisciplinary program aiming to advance the understanding of the fundamental inferential and algorithmic limits of statistical estimation on large domains and large networks, including functional estimation on large alphabets, learning graph properties with network sampling, extrapolating unseen species. These objectives come with a set of theoretical and practical challenges, which can be effectively addressed by a new combination of insights and techniques from information theory, high-dimensional statistics, approximation theory, random graphs and matrices, and optimization. In addition to determining the statistical limits and designing efficient and scalable procedures, which provide key enabling technologies for such high-impact applications of data science as neuroscience, gene association networks, and social network analysis, a major innovation lies in rigorously identifying the optimal statistical performance under complexity constraints and, complementarity, understanding the inferential power and limitations of important classes of algorithms such as spectral methods, relaxation hierarchies, and message-passing algorithms.

View original record on NSF Award Search →