CLUSTERING AND SEARCH FOR INFORMATION SYSTEMS USED BY BIOLOGISTS
Carnegie-Mellon University, Pittsburgh PA
Investigators
Linked publications & trials
Abstract
This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. Primary support for the subproject and the subproject's principal investigator may have been provided by other sources, including other NIH sources. The Total Cost listed for the subproject likely represents the estimated amount of Center infrastructure utilized by the subproject, not direct funding provided by the NCRR grant to the subproject or subproject staff. Spectral clustering is an effective and elegant clustering method based on the pairwise similarity between objects. Recently I have developed a fast and simple spectral-clustering like technique called power iteration clustering. As in spectral clustering, points are embedded in a low-dimensional subspace derived from the similarity matrix for the data points;however, while in spectral clustering, the subspace is derived from the bottom eigenvectors of the Laplacian of an affinity matrix, in our proposed method, the subspace is an approximation to a linear combination of these eigenvectors. The new method obtains comparable or better clusters than existing spectral methods, but is extremely scalable, and well-suited to parallel processing on a cluster machine (such as codon or warhol). I would like to explore use of this clustering method for information spaces associated with biologists( personal information needs;this is a project funded by NIH, but one without extensive computational resources as the moment. For more information on the technique, see http://www.cs.cmu.edu/~wcohen/postscript/nips-2009-pic.pdf For more information on the project, see http://www-2.cs.cmu.edu/~wcohen/querendipity/
View original record on NIH RePORTER →