Wisdom of Crowds: Integrating Gene Set Enrichment Studies Involving RNA-Seq.
Southern Methodist University, Dallas TX
Investigators
Linked publications & trials
Abstract
? DESCRIPTION: Gene Set Enrichment Analysis (GSEA) aims at identifying essential pathways, or more generally, sets of biologically related genes that are involved in complex human diseases. Due to many advantages it offers, GSEA has been proved to be crucial in systems biology studies that can lead to an integrated understanding of fundamental biological processes underlying disease pathogenesis, and elements defining therapeutic targets as well as responses to treatment selections. However, despite its potential importance in promoting human health, it is striking that conclusions of GSEA drawn from isolated studies are often sparse, and different studies may lead to inconsistent and sometimes contradictory results. This problem is largely related to the following limitations. Firstly, studies have shown that isoform-specific expression variations play important roles in complex human diseases. However, the microarray technology traditionally used for mRNA profiling often lacks the resolution needed to measure isoform-specific expression. Secondly, sample sizes of individual genome-wide transcriptomic studies are typically insufficient relative to an overwhelming number of genes. In the wake of next generation sequencing (NGS) technologies, it has been made possible to measure genome-wide isoform-specific expression levels, calling for next generation innovations that can utilize the un- precedence resolution. Further, enormous amounts of data have been created from various microarray and RNA-seq experiments; and the volume continues to grow fast. All these give rise to tremendous demand for developing methods of integrative GSEA (iGSEA) that allow for explicit utilization of isoform-specific expression, to combine multiple relevant studies, in order to avoid indecisive or potentially conducting conclusions from individual data and so to enhance the power, reproducibility and interpretability of the analysis. The goal of this project is to develop novel statistical methods and bioinformatical tools for iGSEA to efficiently synthesize diverse mRNA expression data from studies involving newly emerging RNA-Seq experiments as well as conventional microarray experiments, with an emphasis on integrating isoform-specific expression. In Aim 1, we will develop an innovative meta-analysis method for iGSEA using isoform-specific expression. Specifically, we will incorporate ideas from exe-effect and random-effects models, newly proposed and tested for meta-analysis of genome-wide association studies, into iGSEA, in order to achieve the maximum possible statistical efficiency while allowing for inclusion of heterogeneous studies. Aim 2 will propose robust meta-analysis methods to integrate both isoform- and gene-level expression data from a variety of sources. Aim 3 will develop a fully integrated Bayesian method to incorporate existing biological information more effectively. A powerful Bayesian hierarchical approach will be proposed to jointly model different sources of information. This will not only drastically improve the power of iGSEA, but also simultaneously reveal interesting genes and gene sets, as well as `responsible' isoforms of each identified gene.
View original record on NIH RePORTER →