GGrantIndex
← Search

III: Small: Spectral Methods for Active Clustering and Bi-Clustering

$388,765FY2011CSENSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Clustering or organization of data into groups is a fundamental problem that forms the basis of exploratory data analysis and aids in data management. However, there is often a significant resource and computational cost associated with obtaining and analyzing large-scale datasets that routinely arise in modern systems, such as the Internet, biological and social networks. The ability to discover meaningful clusters in high-dimensional data that is plagued with high noise, outliers and missing observations, will have a significant impact on understanding these systems. This project aims to develop robust clustering methods that can identify clusters very efficiently by selectively querying for the most informative data measurements. Spectral clustering is a popular technique that identifies clusters by analyzing the eigenvectors of a matrix of similarity values between the data points. This project investigates the effect of missing and erroneous data on the eigenvector structure, and leverages this understanding to develop active methods that intelligently guide subsequent data queries. Robust and efficient clustering methods are crucial for identifying groups of proteins and drugs that interact with each other, paving the way for transformative health technologies. These methods are also important for learning and maintaining the organization of computer and social networks, thus promoting seamless exchange of ideas and technology. This PI is involved in disseminating the research through collaborations with the CMU Lane Center for Computational Biology, publishing results and software online (http://www.cs.cmu.edu/~aarti/research_projects), developing and teaching inter-disciplinary courses, as well as the Opportunities for undergraduate women research in Computer Science (OurCS) program at Carnegie Mellon University.

View original record on NSF Award Search →