CAREER: Geometric and Combinatorial Methods for Distribution-Free Inference and Dependent Network Data
University Of Pennsylvania, Philadelphia PA
Investigators
Abstract
Modern statistical applications often involve multivariate data that violate the convenient assumptions of independent sampling and tractable parametric forms. For instance, parametric methods are often inadequate in the analysis of complex high-dimensional data arising from genomics, epidemiology, and bioinformatics. This necessitates the development of procedures that are agnostic to the distribution of the data, computationally efficient, and yet statistically powerful for large nonparametric classes. Similarly, the classical assumption of independence is routinely violated in combinatorial datasets arising from social networks, making it increasingly important to develop realistic and mathematically tractable methods for modeling structure and dependence in high-dimensional distributions. This project leverages ideas from recent developments in optimal transport theory, random geometric graphs, and statistical physics to gain a deeper understanding of (1) multivariate distribution-free inference and (2) dependent network data. The educational and outreach component of this project will aim to foster undergraduate research and prepare graduate students in mentoring, through curriculum development, directed reading groups, and summer programs. The first component of this project will study the efficiency properties of nonparametric, distribution-free two-sample tests based on the emerging theory of multivariate ranks, which include, among others, the rank analogue of the celebrated energy distance test. The project will also explore the asymptotic properties of tests based on optimal matchings and their applications to detecting balance in observational studies. The second component of this project will focus on modeling dependence in complex relational data, using the Ising model and, more generally, higher-order (tensor) Markov random fields. The goal here is to build a framework for simultaneously modeling the network dependency (arising from neighborhood interactions) and the individual node effects, and to develop a holistic theory of parameter estimation in these models using recent advances on random tensors and tools from statistical physics. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →