GGrantIndex
← Search

Cluster Analysis, Predictive Distributions, and Stochastic Search Algorithms

$448,554FY2004MPSNSF

University Of Florida, Gainesville FL

Investigators

Abstract

Cluster analysis is a widely used exploratory tool for finding patterns in data. The basic goal of a cluster analysis is to separate m distinguishable objects (based on measurements associated with them) into groups, or clusters, such that the objects within each group are "similar" while the groups themselves are "different." The number of possible partitions of m objects grows extremely quickly with m, and consequently it is impossible to perform an exhaustive search for the best partition. Most standard methods such as hierarchical and K-means clustering: (a) sacrifice an extensive search of all possible partitions for speed of implementation; (b) fail to (globally) optimize an objective function; and generally return a single answer, even though there may be many equally good answers that are all relevant to the application. This investigation will look at: (i) The improvement attainable in the performance of clustering algorithms using data smoothing; (ii) A model-based approach to simultaneously smooth the data while providing a natural objective function for ranking partitions; and (iii) Strategies for conducting a stochastic search with high speed computing and Markov chain Monte Carlo algorithms. The proposed methodology has already been successfully applied in some examples. Cluster analysis has seen renewed interest of late due, in part, due to its applications in bioinformatics, where it can be used with microarray analysis to identify groups of genes that can be linked to certain diseases. For example, it could be the case that the presence or absence of certain genes could predispose a person to certain types of cancers, or to indicate greater post-operative risk from certain procedures. Therefore, the benefits to society of the proposed project include the advances from the better understanding of these relationships that these improved algorithms will yield, and the clearer picture provided of the links between genes and diseases. Graduate students will also be trained to develop these methods further. Other researchers, trained in these new methods, will find their own investigations enhanced.

View original record on NSF Award Search →