ABI Innovation: doseR: a novel framework for dosage compensation and global expression analysis
University Of Kansas Center For Research Inc, Lawrence KS
Investigators
Abstract
This research aims to develop and make broadly available a novel statistical methodology for analyzing patterns of gene expression that result from differences in chromosome copy number, in particular as arise on the sex-chromosomes (e.g. XX females versus XY males). Often a difference in chromosome copy number, which causes a difference in gene dose for those chromosomes, results in a corresponding effect on gene expression, termed a "dosage effect". However, in many organisms, special "dosage compensation" mechanisms have evolved to mitigate these dosage effects arising from differences in chromosome copies, such as between males and females on the sex-chromosomes. There is increasing interest in understanding which organisms employ dosage compensation mechanisms, and why. Recent advances in genome-wide assays of gene expression have greatly expanded which taxa can be assayed for dosage compensation, but analytical approaches for such data are varied, inconsistently applied, and typically do not make full use of the information available in the data. The methods and software developed from this project, called "doseR", will provide a cohesive and comprehensive solution to each of these issues. As such, the "doseR" project fills a major gap that currently exists in analytical methods for genomic investigations of dosage compensation. Furthermore, beyond dosage compensation analysis, this methodology can be generalized to identify broad shifts in gene expression between groups of genes that arise under different biological conditions. Thus, development and deployment of doseR will bridge the gap between biological intuition and bioinformatic inference, not only for dosage compensation, but also for many still as yet unforeseen lines of inquiry. This project also creates several training opportunities for undergraduate students, including intensive bioinformatic training workshops and direct participation in research activities. This research aims to develop and make broadly available a novel linear-modeling statistical methodology for analyzing sex-chromosome dosage compensation using genome-wide RNA-seq expression data. The statistical approaches currently employed for such analyses are far from ideal given the nature of the data and the desired set of inferences. Currently, biological replicates are averaged into a single measurement per gene and heavily normalized. Then particular effects of gene expression on the sex-chromosome relative to autosomes are evaluated using absolute expression while gene dosage effects are assessed using expression ratios, in both cases using non-parametric statistical tests. A more statistically robust approach is to employ linear mixed-effects modeling of gene expression. This provides a unified statistical framework to assess magnitude and significance of both chromosome-specific and dosage effects on gene expression. Moreover, it is applied directly to the sequencing read counts for each gene and incorporates scaling factors such as sequencing depth and transcript length into the models describing the data, as is done in most analyses of differential expression. As such, extensive normalization is avoided and statistical replicates are readily incorporated into the analysis. These methods will be implemented in a new software package, named "doseR", written in the R statistical programming language and distributed as part of the Bioconductor suite of bioinformatic software tools. Performance of the new statistical model and its software implementation relative to previous methods of assessing dosage compensation will be evaluated through extensive simulations of RNA-sequencing data. Application to specific empirical data sets relevant to dosage compensation will also be examined and evaluated. While the immediate motivation for software development is dosage compensation analysis, the proposed methodology can be employed in any analytical scenario requiring the detection of directional shifts in expression for multiple, specific subsets of genes. It therefore provides a tool with broad utility in systems biology research. Status and results of this project can be found at https://walterslab.github.io/doseR/.
View original record on NSF Award Search →