ITR: Emerging Communities in Large Linked Networks: Theory Meets Practice
Cornell University, Ithaca NY
Investigators
Abstract
This project will develop the theory, concepts and tools to track changes and detect emerging structure in large networks. It combines a theoretical investigation of how networks and communities evolve over time with empirical studies using the NEC CiteSeer database. Clustering plays a crucial role in detecting community structure. However, tracking changes in structure over time places new demands on clustering algorithms. In particular it requires a stability not usually demanded of clustering techniques. This project starts from the premise that there actually are real communities of various sizes in the data and that these communities are invariant under changes such as random removal of 5-10% of the papers in the database. If a clustering technique gives a radically different clustering when an additional 10,000 papers are added to the database, it will be impossible to separate small changes in the evolving structure from artifacts of the clustering technique. The project builds upon a concept of natural communities of various sizes that can be identified under quite strong changes in the data. A central component of the work is to develop a principled generative model of growing random graphs which has complex evolving community structure and in which the concept of a natural community arises. Tools will be developed to find these natural communities with sufficient stability to track real changes in the clusters over time and identify when new communities emerge. The ability to track emerging trends and detect hidden structure in large networked digital data sources should be of potentially great societal benefit. In particular, it would allow the end-user to search for information in a more informed manner, and enable a pro-active approach towards emerging trends and new developments.
View original record on NSF Award Search →