GGrantIndex
← Search

Fast and accurate phasing using the positional Burrows-Wheeler transform (PBWT)

$238,438R21FY2017HGNIH

Harvard School Of Public Health, Boston MA

Investigators

Abstract

Abstract Phasing, defined as the estimation of haplotypes from diploid genotype data, is a fundamental problem in medical and population genetics. Phasing is a key preprocessing step for genotype imputation algorithms employed in genome-wide association studies of diseases and complex traits, and is also important for mapping molecular QTL using allele-specific reads, detecting clonal mosaicism, inferring population structure, and detecting natural selection. Considerable resources have been invested into developing accurate phasing algorithms, but currently, unsolved challenges include: (i) incorporating large reference panels, such as the Haplotype Reference Consortium, to improve phasing accuracy (reference-based phasing), and (ii) phasing extremely large cohorts using within-cohort data (cohort-based phasing). Here, we propose an exploratory two-year research program, in which we will develop methods and software for both reference-based phasing, and cohort-based phasing, using a new data structure based on the Positional Burrows-Wheeler Transform (PBWT). We aim to make fast and accurate phasing methods and software freely available to all researchers via public phasing servers. We will also explore the early and conceptual stages of developing PBWT-based methods for reference-based imputation as well. Our team has multiple strengths: our statistical and computational expertise; our track record of producing practical, publicly-available software packages for a broad range of applications in statistical genetics that are widely used by the community, and our data-driven approach to methods research. We will guide our methods development using data from 500,000 samples from the UK Biobank, and will work closely with the Haplotype Reference Consortium (see letters of support).

View original record on NIH RePORTER →