Bayesian Models for Gene Expression with Microarray Data

$287,363R01FY2005CANIH

Texas A&M University System, College Station TX

Investigators

Linked publications & trials

Paper 21165171 Paper 20495685 Paper 19750023 Paper 19673858 Paper 19609371 Paper 19444335 Paper 19444331 Paper 19430598 Paper 19272947 Paper 18392118 Paper 18203582 Paper 18024478 Paper 17889993 Paper 17725810 Paper 17077137 Paper 16870934

Abstract

DESCRIPTION (provided by applicant): This project is concerned with parametric and semiparametric modeling of gene expression data. DNA microarrays and other high-throughput methods for analyzing complex nucleic acid sequences now make it possible to rapidly, efficiently and accurately measure the levels of many genes expressed in a biological sample. The main difficulty with microarray data analysis is that the sample size is very small when compared to the dimension of the problem (the number of genes). The number of genes for a single individual is usually in the thousands and there are few individuals in the data set. We propose several novel parametric Bayesian modeling approaches for gene selection, tumor classification, Bayesian networks, gene clustering and dimension reduction methods. Most of the existing methods are not model-based and thus are unable to address specific questions regarding formal assessment of uncertainties or assessment of the fit of a specific model. Also model-based approaches offer the potential for extension to more complex situations, e.g., probabilistic mixture modeling, handling missing data, etc. We will develop Bayesian hierarchical models for microarray data, which will accommodate several modeling factors flexibly at different levels. In several of the modeling frameworks, we will keep the dimension of the model space unknown to create added flexibility. It is impossible to get analytical answers in these flexible classes of models so simulation based Markov Chain Monte Carlo (MCMC) methodology with dimensional jumping algorithms will be used to derive the estimates (uncertainty distributions) of the unknown parameters.

View original record on NIH RePORTER →