Integrated approaches to decipher 3D genomes

$427,502U54FY2015DKNIH

University Of Southern California, Los Angeles CA

Investigators

Linked publications & trials

Abstract

Project summary A grand challenge of genome research is to use data from DNA proximity ligation methods (such as Hi-C) together with other available data to create accurate, predictive models of the 3D nuclear organization of the genome and reveal the functional implications of different genome organizations. The limitations and incompleteness of the data make this a challenging task. For instance, Hi-C data describe only the average genome conformation over a large ensemble of cells, but the spatial genome organization is highly dynamic and variable among individual cells in the same sample. Moreover, the data cannot reveal any higher order information such as co-occurrences of interactions in the same cell. Even single-cell Hi-C approaches are hampered by low interaction coverage per cell and limited sampling with statistical relevance across the large conformational variability among genomes. In addition, the dynamic nature of the genome makes it very challenging to find a comprehensive description of genome structure/function relationships by mining functionally relevant structural chromatin patterns. Therefore, there is urgent demand for computational methods that can appropriately interpret Hi-C data for 3D genome modeling and analysis and integrate this data with any other available information about the genome organization, for example from imaging and other technologies. We propose a new population-based modeling approach, which reframes the problem of optimizing a genome structure population as a maximum a posteriori probability estimation problem. Our method can deconvolute ensemble-based Hi-C data into a population of genome structures that are altogether statistically consistent with the input data and describe the best approximation of the true genome structure population given the available data. Our probabilistic approach provides a framework for comprehensive integration of all available data, including ensemble-average and single-cell Hi-C data, as well as other experimental data sources (e.g. imaging), to increase the coverage, accuracy and resolution of the predictive genome models. We also develop a graph mining approach for chromatin pattern discovery in an ensemble of genome structures and relate these patterns to a variety of nuclear processes, such as transcription, translocation, and DNA replication.

View original record on NIH RePORTER →