GGrantIndex
← Search

CAREER: Optimal Bayesian Methods for Classification

$157,638FY2015CSENSF

Ohio State University, The, Columbus OH

Investigators

Abstract

Model-free classification has too often produced unreliable and irreproducible predictions that impede progress in high-throughput genomic studies. In 2011, the Director of the FDA Center for Drug Evaluation and Research, Janet Woodcock, estimated that as much as 75% of published biomarker associations are not replicable. The root cause is an inattention to the properties and limitations of estimates of the misclassification rate, which have been shown in numerous studies to perform poorly in the high-dimensional small-sample setting typically encountered in genomics. To address the problem of predictive and replicable scientific discovery in biomedicine and beyond, this CAREER research project develops Bayesian computational and statistical methods for small-sample classification. The specific research objectives are threefold. First, the work involves developing methods of transforming scientific knowledge, for instance biological knowledge in the form of gene regulatory pathways, into Bayesian models to enhance classifier design and analysis. Integrating a prior with observable data is key to improving prediction accuracy and error estimation in small-sample settings. Second, the investigators study rates of convergence for both classification and error estimation and develop approximations and bounds on the expected number of samples necessary for satisfactory performance. Finally, this project develops Bayesian methods for optimal feature set selection and performance analysis. Feature selection is a requisite step in disease biomarker discovery due to the high-dimensionality of data captured by modern measurement technologies. These timely advances in small-sample classification applied to genomics greatly impact health and wellness in society, with direct applications in materials discovery and other areas in science, engineering and statistics. Furthermore, a key component of the outreach agenda of this project is in training and educating students at both the undergraduate and graduate levels, particularly through the development of a book and course on pattern recognition that incorporate research results and practical applications from this project.

View original record on NSF Award Search →