GGrantIndex
← Search

Overcoming bias and unwanted variability in next generation sequencing

$600,000R01FY2016HGNIH

Dana-Farber Cancer Inst, Boston MA

Investigators

Linked publications & trials

Abstract

? DESCRIPTION (provided by applicant): Next Generation Sequencing (NGS) has become the most widely used high-throughput technology in biology. Today, NGS applications go far beyond genome sequencing and studies of DNA sequence itself to include the measurement of quantitative and dynamic outcomes underlying genomic function in development and disease. These measurements, specifically, RNA abundance, protein binding, DNA methylation, and microbiome composition, are at the core of studies undertaken by large consortia and individual labs alike. However, when measuring these quantitative outcomes, NGS data are subject to severe technological and biological biases, systematic errors, and unforeseen variability which can greatly impact downstream analyses. Only when these issues can be readily identified and addressed will the technology maximally benefit science and medicine. Our group has extensive experience developing statistical methods that transform raw high- throughput data into the ultimate measurements relied upon by biologists and clinicians. Our gene expression array preprocessing methods are practically an industry standard and our recent work on NGS applications is widely cited and used. Furthermore, Dr. Irizarry co-leads the Bioconductor project, one of the most widely used open-source projects for the development and dissemination of state-of-the-art statistical methodology. We propose to continue to leverage our experience with high-throughput technologies to develop indispensable analysis tools for NGS data in four critical, widely used applications urgently requiring reliable statistical analysis tols. At the core of our methods is the common need, across these four applications, to overcome bias, systematic error, and unforeseen variability. To aid in the development and assessment of these tools we propose experiments specifically designed to serve as benchmarks. These problems are matched well to our specific expertise and we will address them with the following aims. 1) Develop statistical methods for RNA transcript estimation that are robust to sequencing artifacts. 2) Develop statistical methods that estimate heterogenous cell composition in DNA methylation data. 3) Develop statistical methods for unbiased quantification in microbial community 16S rRNA gene sequencing studies. 4) Develop methods that account for protocol-induced bias in genome-wide enrichment scans (e.g., ChIP-seq and DNase I-seq).

View original record on NIH RePORTER →