Collaborative Research: Statistical Network Integration
Texas A&M University, College Station TX
Investigators
Abstract
This project pursues the contemporary problem of statistical network integration facing scientists, practitioners, and theoreticians. The study of networks and graph-structured data has received growing attention in recent years, motivated by investigations of complex systems throughout the biological and social sciences. Models and methods have been developed to analyze network data objects, often focused on single networks or homogeneous data settings, yet modern available data are increasingly heterogeneous, multi-sample, and multi-modal. Consequently, there is a growing need to leverage data arising from different sources that result in multiple network observations with attributes. This project will develop statistically principled data integration methodologies for neuroimaging studies, which routinely collect multiple subject data across different groups (strains, conditions, edge groups), modalities (functional and diffusion MRI), and brain covariate information (phenotypes, healthy status, gene expression data from brain tissue). The investigators will offer interdisciplinary mentoring opportunities to students participating in the research project and co-teach a workshop based on the proposed research. The goals of this project are to establish flexible, parsimonious latent space models for network integration and to develop efficient, theoretically justified inference procedures for such models. More specifically, this project will develop latent space models to disentangle common and individual local and global latent features in samples of networks, propose efficient spectral matrix-based methods for data integration, provide high-dimensional structured penalties for dimensionality reduction and regularization in network data, and develop cross-validation methods for multiple network data integration. New theoretical developments spanning concentration inequalities, eigenvector perturbation analysis, and distributional asymptotic results will elucidate the advantages and limitations of these methods in terms of signal aggregation, heterogeneity, and flexibility. Applications of these methodologies to the analysis of multi-subject brain network data will be studied. Emphasis will be on interpretability, computation, and theoretical justification. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →