Bayesian Mixture Models: Unified Theoretical Frameworks and MCMC Methods

$150,000FY2009MPSNSF

University Of Missouri-Columbia, Columbia MO

Investigators

Abstract

This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5) Different kinds of Bayesian mixture models, such as finite mixtures, finite and infinite hidden Markov models, Dirichlet process models and analysis of densities models, and their methods of inference have traditionally been developed from isolated perspectives. There have been few attempts to view these models from a common standpoint. For the massive data sets increasingly encountered in real world studies, the inferential techniques for these models often necessitate heavy computational burdens that preclude fully Bayesian solutions. The proposed research will (i) achieve a unification by formulating general classes of mixture models whose special cases are many common mixture models, (ii) explore from a common standpoint important theoretical properties of generalized mixture models, such as posterior consistency, and discover asymptotics that form the basis of broadly applicable and cost-effective inferential strategies, (iii) develop efficient Markov chain Monte Carlo techniques for fitting generalized mixture models to large datasets, (iv) develop user-friendly statistical software that implement these methods with the goal of disseminating them to researchers, and (v) apply the proposed methods to analyze publicly available, high-throughput Comparative Genomic Hybridization (CGH) data on various kinds of cancer. Bayesian mixture models are ubiquitous in statistical applications because of their ability to capture real-world complexities through relatively simple constructions. These models have found application in such diverse areas as computer science, epidemiology, economics, finance, forestry, genetics, and marketing. The range of applications of this rapidly developing area has exploded in the last decade. Through its theoretical and methodological components, this research will establish key characteristics shared by disparate classes of mixture models. It will enable the utilization of mixture models in applications where it is currently difficult, if not impossible, to fit the models due to sheer volume of data. The investigator will ensure effective dissemination of the research through open-access software developments, publication in leading journals, application of the proposed methods to the analysis of microarray-based cancer data, and presentation of the results in statistical and subject-matter conferences.

View original record on NSF Award Search →