Differential Abundance Methods for Large Heterogeneous-Featured Metabolomics Datasets

$145,716R03FY2016CANIH

University Of Kentucky, Lexington KY

Investigators

Linked publications & trials

Paper 36949395 Paper 35448492 Paper 31964922 Paper 30830442

Abstract

Project Summary Metabolomics deals with the systematic identification and quantification of small molecules in biological systems. Frequently, metabolomics studies aim to identify those metabolites that have differential abundances between two or more conditions. However, in the very large untargeted metabolomics datasets being generated today, there are often many detected metabolite features that will be zero for a large fraction of samples in either/or both sample classes, creating data sparsity. Previous work has been done to develop statistical methods capable of testing for differential abundances in metabolomics datasets with high data sparsity (i.e. large fraction of zero values in the dataset). However, these methods are not appropriate for data from matched pair experimental designs, which are expected to become the standard as metabolomics is applied to more and more human disease studies. Furthermore, the currently available methods either make simplistic statistical assumptions, or use the simplest method for not making assumptions about the data available, which are not necessarily appropriate. In addition, peak assignment and correspondence ambiguities play a large role in the zero values and redundancy seen in these datasets. However, no methods have been developed to directly address these issues. In this proposal, we will develop novel informatics and statistical methods that address these distinct issues seen in large heterogeneous featured metabolomics datasets: i) a fuzzy set-based algorithm method that addresses peak assignment and correspondence ambiguities and ii) a semi-parametric method to perform differential abundance analysis for metabolomics datasets with high data sparsity, possibly non-normally distributed data, and matched-pairs experimental designs. We will use simulation studies to assess how well these new methods address the aforementioned issues and how much they improve the power of differential abundance analysis. Finally, we will make these new methods available through Bioconductor packages and a web-based service.

View original record on NIH RePORTER →