GGrantIndex
← Search

EAGER: Towards New Scalable Stochastic Flow Algorithms

$150,000FY2011CSENSF

Ohio State University, The, Columbus OH

Investigators

Abstract

The process of clustering or partitioning of nodes within a graph is a fundamental task with applications in many areas ranging from social network analysis to chip design and from biological network analysis to the analysis of intelligence networks. This project seeks to explore and develop a new class of algorithms for graph clustering based on the principle of stochastic flows. Such algorithms have been used effectively on small scale biological networks and have been shown to be robust to noise effects. However, widespread utilization has been limited due to the lack of scalability of the algorithm and its inability, in its current form, to accommodate domain-specific constraints on clustering. This exploratory project seeks to address these two limitations: First, it seeks to develop a novel approach for supporting flexible clustering in the context of stochastic flow clustering, allowing users to control the skew of the resulting clustering arrangement (e.g., to ensure balanced clusters), and allowing the nodes of a graph to participate in multiple clusters (so as to allow clusters to overlap). Second, it seeks to develop solutions that can scale to very large graphs (e.g. social networks, web graphs) through the innovative applications of graph sparsification and novel parallel algorithms on high performance systems. Open source implementation of the resulting a proof-of-concept solution will be distributed to the broader scientific community. The scientific impact of this exploratory research agenda include the following: First, if one is successful in scaling up stochastic flow algorithms to web-scale datasets while retaining its many advantages, this would open up a viable robust and improved alternative to the current state-of-the art. Second, one can also employ stochastic flow clustering algorithms in a manner analogous to spectral methods on more traditional data sources (non-graphical), enabling more wide-spread use of flow clustering algorithms. The broader impacts of the project include increased research-based training opportunities for undergraduate and graduate students in data analytics. Additional information about the project can be found at: http://www.cse.ohio-state.edu/~srini/EAGER11/

View original record on NSF Award Search →