GGrantIndex
← Search

Computational framework for identifiable and phase-consistent allele-specific expression quantification

$701,728FY2022BIONSF

Carnegie Mellon University, Pittsburgh PA

Investigators

Abstract

Haplotype inference and allele-specific transcript expression quantification are two fundamental problems in genetics and genomics. Haplotype inference aligns maternal and paternal alleles of genetic variants along two diploid chromosomes, whereas allele-specific expression quantification obtains the expression levels of transcripts of maternal and paternal origins from RNA-seq reads. These two problems are coupled in that one can affect the accuracy of the other: accurate allele-specific expression quantification requires accurate haplotypes to map RNA-seq reads to and the accuracy of haplotype inference can be enhanced by allele-specific RNA-seq reads. While existing works have considered these two problems separately, this project develops a computational framework to address these two fundamental problems jointly in a single statistical framework to enhance the accuracy of both inferred haplotypes and allele-specific expression quantification. The computational methods to be developed in this research will advance various aspects of biological research that require accurate allele-specific expression estimates and haplotypes, including mapping allele-specific eQTLs, detecting imprinted genes, imputing untyped variants, finding signatures of natural selection, and detecting recombination events. The outcome of the research will be used in outreach activities in minority serving institutions to recruit graduate students. The project develops a computational framework for obtaining accurate allele-specific expression measurements and haplotypes from RNA-seq and genotype data. Two existing frameworks, one for transcript expression quantification and the other for haplotype inference (e.g., Beagle), are combined into a single framework, while keeping the computational efficiency of the original frameworks. Each of these two existing frameworks is modified to address two previously-unmet challenges regarding allele-specific reads: for the RNA-seq quantification, the project develops a mathematically rigorous approach to obtaining identifiable allele-specific expression estimates at gene level, at transcript-set level, or at individual transcript level, whereas for haplotype inference, the project couples the model in Beagle with RNA-seq quantification methods of these investigators to jointly estimate identifiable allele-specific expression levels and haplotypes that are consistent with each other. The computational methods are benchmarked on allele-specific eQTL mapping, using genotypes and RNA-seq reads from human trios and LG/SM intercross mice with known haplotypes. The outcome of the research is available at http://www.cs.cmu.edu/~sssykim. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →