Statistical Analysis Methods and Software for ChIP-seq Data

$299,496R01FY2014HGNIH

University Of Wisconsin-Madison, Madison WI

Investigators

Linked publications & trials

Paper 39468737 Paper 39386499 Paper 38640488 Paper 37961473 Paper 37856321 Paper 37740957 Paper 37295843 Paper 37004197 Paper 36747701 Paper 36347925 Paper 36253828 Paper 35769482 Paper 35652733 Paper 34949667 Paper 34491912 Paper 32958497 Paper 31328826 Paper 30988468 Paper 30534948 Paper 29910842 Paper 29126153 Paper 28911122 Paper 28781711 Paper 28541490 Paper 28220625 Paper 27856289 Paper 27835030 Paper 27576189 Paper 27405803 Paper 26609213 Paper 26598390 Paper 26484757 Paper 26478641 Paper 26423458 Paper 26114571 Paper 26092860 Paper 26073540 Paper 25614629 Paper 25533967 Paper 25411484 Paper 25380244 Paper 24966364 Paper 24816274 Paper 24722192 Paper 24146601 Paper 24005282 Paper 23872977 Paper 23844871 Paper 23818864 Paper 22996659 Paper 22883957 Paper 22492709 Paper 22354995 Paper 22205700 Paper 22081761 Paper 22057161 Paper 21808000 Paper 21779159 Paper 21044070 Paper 20802488 Paper 20361856 Paper 20232521 Paper 19966067 Paper 19941826 Paper 19602540 Paper 19572828 Paper 19369425 Paper 19270271 Paper 18779319 Paper 18411210 Paper 18385155 Paper 18229712

Abstract

DESCRIPTION (provided by applicant): The advent of high throughput next generation sequencing (NGS) technologies have revolutionized the fields of genetics and genomics by allowing rapid and inexpensive sequencing of billions of bases. Among the NGS applications, ChIP-seq (chromatin immunoprecipitation followed by NGS) is perhaps the most successful to date. ChIP-seq technology enables investigators to study genome-wide binding of transcription factors and mapping of epigenomic marks. Both of these play crucial roles in programming of gene expression in a cell specific manner; therefore their genome-wide mapping can significantly advance our ability to understand and diagnose human diseases. Although basic analysis tools for ChIP-seq data are rapidly increasing, all of the available methods share one or more of the following shortcomings. First, they focus on analyzing one ChIP- seq sample at a time. As ChIP-seq is becoming commonly utilized in epigenome mapping to understand phenotypic variation, the demand for methods that can handle multiple samples efficiently is rapidly rising. Second, they only utilize sequence reads that align to unique locations on the reference genome. This hinders the study of highly repetitive regions of genomes by ChIP-seq. Third, commonly used designs for ChIP-seq experiments employ one matching control sample per each ChIP-seq sample. This limits the genome coverage of control experiments and impacts the detection of enrichment in ChIP samples. It also significantly contributes to increase in sequencing costs for large-scale ChIP-seq studies. The objective of this project is to address these challenges of ChIP-seq analysis in three specific aims: (1) Statistical methods for inference from multiple samples; (2) Probabilistic models for utilizing reads that map to multiple locations (multi-reads) in the genome; (3) Development and evaluation of in silico pooling designs for control experiments. The projects will be accomplished through a combination of methodological development, simulation, computational analysis, and experimental validation. Methods will be developed and evaluated using datasets from the ENCODE, modENCODE, and the RoadMap Epigenomics consortiums as well as novel datasets from collaborators. Statistical resources generated from the project, which will be disseminated in publicly available software, will provide essential tools for the efficient design and analysis of ChIP-seq experiments.

View original record on NIH RePORTER →