SHAPEIT+Salmon: haplotype phasing and RNA-seq quantification for allele-specific eQTL mapping
Carnegie-Mellon University, Pittsburgh PA
Investigators
Linked publications & trials
Abstract
PROJECT SUMMARY / ABSTRACT Allele-speci?c expression quantitative trait locus (eQTL) mapping has become increasingly popular, since it en- hances the traditional eQTL mapping by providing signi?cantly more detailed gene regulatory mechanisms un- derlying the genetic architecture of diseases. Allele-speci?c eQTL mapping identi?es cis-acting and trans-acting eQTLs that each pinpoint to cis-regulatory elements and trans-acting factors, by leveraging the fact that unlike trans-acting eQTLs, cis-acting eQTLs affect the expression of transcripts from the same haplotype as the variant itself, causing allelic imbalance in expression. However, allele-speci?c eQTL mapping requires a reliable long- range phasing of genome sequences and an accurate allele-speci?c expression quanti?cation from RNA-seq data consistent with the genome phasing. Most existing works have treated allele-speci?c expression quanti?cation and phasing as independent tasks, even though each can enhance the accuracy of the other. In this proposed research, we will modify and pair up the two widely-used tools, SHAPEIT for genome phasing and Salmon for RNA-seq quanti?cation, to obtain an accurate phasing and allele-speci?c expression quanti?cation consistent with each other for allele-speci?c eQTL mapping. The combined tool will inherit or enhance the accuracy and ef?ciency of the two original methods. If phased sequences are known from experimental or trio data, we will replace the EM algorithm of Salmon with an accelerated EM to address the extreme multi-mapped read problem with computational ef?ciency. If phased sequences are not available as in unrelated individuals, we will modify SHAPEIT to jointly phase the variants and allele-speci?c read abundances, embedding allele-speci?c expression quanti?cation within SHAPEIT and using Salmon for obtaining transcript quanti?cation and allele-speci?c read abundances. As a testbed, we will use genotype and RNA-seq data from a 50 generation intercross, cross be- tween two inbred mouse strains. Because these data are derived from two fully sequenced inbred founders, the correct phase is known. Though we use mice as a testbed, our approach is applicable to data from any diseases, tissues, and organisms, including GTEx data.
View original record on NIH RePORTER →