Network Comparison, a Cornerstone of the Foundations of Network Science

$125,000FY2016MPSNSF

Santa Fe Institute, Santa Fe NM

Investigators

Joshua A Grochow Laurent Hebert-Dufresne

Abstract

"Big data" increasingly means big networks, because such data either directly concerns relational structures as in human and animal mobility patterns, gene interaction networks, or is influenced by an underlying relational structure as in epidemiological studies, urban studies, and cultural diffusion. Most applications of networks rely crucially on comparing networks, for example to detect changes in one network across time, to categorize or classify multiple networks of similar types, or to build analogies across fields by comparing networks of different origins. The question of how to compare two networks in a principled way, without relying on the ad hoc choice of statistics used by many current comparison methods, is key to the foundations of network science. This project will bring to bear new ideas from mathematics, computer science, and statistical physics on the problem of principled, structural comparison of networks. Through pre-existing collaborations, the PIs will leverage these new comparison methods to address questions in several different areas, for example, about: how food webs change across gradients like latitude, altitude, and temperature,morphological growth patterns of bacterial colonies, the evolution of human culture and communities, and links between socio-economic indicators and epidemiology. Our project will develop new rigorous and principled methods of comparing the structure of complex networks. The methods to be pursued aim to get away from single-scale summary statistics; to break new ground, we must think in terms of structural distance rather than statistical inference. In combination with tools from machine learning, such structural comparison methods are an important step towards defining the "space of real-world networks", which could serve as a more rigorous basis for a theory of complex networks. These methods have four advantageous features: (1) they systematically consider multiple scales of network organization, (2) they do not depend on an identification of the nodes of the two networks beforehand, (3) they can compare networks of different sizes, and (4) they are not dependent on any particular generative model of network growth. Very few, if any, of the existing network comparison methods have all of these features, and those that do exist have not been extensively developed. These features enable many new applications in a range of areas, including ecology, microbiology, cultural evolution, and epidemiology.

View original record on NSF Award Search →