Statistical Inference for Multilayer Network Data with Applications
The University Of Central Florida Board Of Trustees, Orlando FL
Investigators
Abstract
Analysis of stochastic networks is extremely important and is used in a variety of applications such as sociology, biology, genetics, ecology, information technology and national security. Networks are very convenient for describing relationship between nodes that may represent people in a social network or brain regions of a person. One of the properties of the majority of networks is that they can be divided into communities with distinct properties and connection patterns. These partitions can be used for answering a variety of questions such as finding tightly connected social groups, or identifying brain regions' connection patterns associated with a disease. While initial efforts were focused on analysis of a single network model, in the last few years one of the most important directions in network science has shifted to the study of sets of individual networks, the so-called multilayer networks, due to both the versatility of the multilayer networks and the variety of applications that can be addressed using this concept. The objective of this research is to develop tools for theoretical and algorithmic analysis of such networks. The theories developed can be applied to analysis of speech-related brain networks that can be affected by epilepsy surgery. This research will be carried out in collaboration with the Functional Brain Mapping and Brain Computer Interface Lab of Advent Health Hospital for Children. The techniques resulting from this project could be applied in a variety of fields that rely on analysis of multilayer stochastic network data: a) medical practice, since a better understanding of connections between brain regions associated with speech will result in more safe and efficient epileptic treatment options; b) medical research, by providing tools for taking into account individual variations of connections between brain regions associated with particular diseases; c) brain science research, by providing tools for analysis of brain networks and their variations; d) molecular biology, by proposing techniques for analyzing the enzymatic influences between proteins related to various functions; e) statistical genetics, by developing procedures for simultaneous studies of gene networks related to several diseases; f) international relations and finance, by analyzing world trade and financial networks corresponding to various modalities; g) social sciences, by analyzing the similarities and the differences in communities related to various types of social connections. Funding will also be used for training work force by carrying out various educational activities, and promoting interdisciplinary research and diversity. The research agenda of this proposal will substantially advance the fields of non-parametric statistics in general, and the emerging field of network data analysis in particular. The spark of the interest in multilayer networks has led to a stream of publications on the subject. However, these publications fall into two very distinct categories: applications driven papers with no theoretical guarantees of the results and statistical papers where those guarantees are provided, but under very restrictive assumptions. While in many applications the main goal is to determine the differences between communities in different layers or sets of layers, the statistical papers focus entirely on the case where the communities are the same in all layers. Due to the absence of relevant theoretical results, in applications, the authors either utilize ad hoc techniques or are forced to make a questionable assumption that the community structure is the same for all the layers. For this reason, there is an overwhelming need for laying solid theoretical foundations and developing efficient computational algorithms for analysis of multilayer networks with diverse community structures. In particular, the objective of this research is the construction of non-parametric techniques that carry out estimation and clustering of multilayer networks where each layer follows the popular Stochastic Block Model and the community structures coincide for some layers and differ for the others. Furthermore, statistical procedures will be supplemented with the precision guarantees via oracle inequalities and minimax studies. This will be accomplished by application of modern algebraic techniques recently employed by the PI. In summary, the research will significantly broaden the arsenal of methods applicable to analysis of multilayer network data by developing techniques for non-parametric estimation and clustering that require few assumptions, are computationally viable, and are also accompanied by theoretical precision guarantees. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →