Overcoming bias and unwanted variability in next generation sequencing

$600,000R01FY2016HGNIH

Dana-Farber Cancer Inst, Boston MA

Investigators

Linked publications & trials

Paper 38978101 Paper 37943166 Paper 37429993 Paper 36050488 Paper 35770795 Paper 34875227 Paper 33997789 Paper 33944776 Paper 33603203 Paper 32841603 Paper 32620142 Paper 32513234 Paper 32181824 Paper 32169095 Paper 31870412 Paper 31783894 Paper 31607556 Paper 31575744 Paper 31409274 Paper 31164141 Paper 30821316 Paper 30712475 Paper 30661755 Paper 30649166 Paper 30400812 Paper 30377266 Paper 30016409 Paper 29997126 Paper 29909987 Paper 29481604 Paper 29335281 Paper 29194469 Paper 29121214 Paper 29025895 Paper 28985737 Paper 28967885 Paper 28945250 Paper 28263959 Paper 27669167 Paper 27311443 Paper 27246923 Paper 26873931 Paper 26868017 Paper 26328750 Paper 26078586 Paper 25964664 Paper 25086505 Paper 25007794 Paper 24995464 Paper 24501098 Paper 24435799 Paper 24413520 Paper 24398039 Paper 24330332 Paper 24076764 Paper 24068705 Paper 23957733 Paper 23888185 Paper 23762209 Paper 23549483 Paper 23457041 Paper 23151247 Paper 23034175 Paper 22493251 Paper 22285995 Paper 22087737 Paper 22039435 Paper 21955804 Paper 21747377 Paper 21706001 Paper 21685414 Paper 21450710 Paper 21400695 Paper 21154709 Paper 21144010 Paper 20838408 Paper 20701754 Paper 20431543 Paper 19912177

Abstract

? DESCRIPTION (provided by applicant): Next Generation Sequencing (NGS) has become the most widely used high-throughput technology in biology. Today, NGS applications go far beyond genome sequencing and studies of DNA sequence itself to include the measurement of quantitative and dynamic outcomes underlying genomic function in development and disease. These measurements, specifically, RNA abundance, protein binding, DNA methylation, and microbiome composition, are at the core of studies undertaken by large consortia and individual labs alike. However, when measuring these quantitative outcomes, NGS data are subject to severe technological and biological biases, systematic errors, and unforeseen variability which can greatly impact downstream analyses. Only when these issues can be readily identified and addressed will the technology maximally benefit science and medicine. Our group has extensive experience developing statistical methods that transform raw high- throughput data into the ultimate measurements relied upon by biologists and clinicians. Our gene expression array preprocessing methods are practically an industry standard and our recent work on NGS applications is widely cited and used. Furthermore, Dr. Irizarry co-leads the Bioconductor project, one of the most widely used open-source projects for the development and dissemination of state-of-the-art statistical methodology. We propose to continue to leverage our experience with high-throughput technologies to develop indispensable analysis tools for NGS data in four critical, widely used applications urgently requiring reliable statistical analysis tols. At the core of our methods is the common need, across these four applications, to overcome bias, systematic error, and unforeseen variability. To aid in the development and assessment of these tools we propose experiments specifically designed to serve as benchmarks. These problems are matched well to our specific expertise and we will address them with the following aims. 1) Develop statistical methods for RNA transcript estimation that are robust to sequencing artifacts. 2) Develop statistical methods that estimate heterogenous cell composition in DNA methylation data. 3) Develop statistical methods for unbiased quantification in microbial community 16S rRNA gene sequencing studies. 4) Develop methods that account for protocol-induced bias in genome-wide enrichment scans (e.g., ChIP-seq and DNase I-seq).

View original record on NIH RePORTER →