Deciphering CTCF code in human host and viral epigenomes
National Institute Of Allergy And Infectious Diseases
Investigators
Linked publications & trials
Abstract
CTCF, a highly conserved DNA binding protein, serves as a global organizer of chromatin architecture. CTCF is involved in regulation of transcriptional activation and repression, gene imprinting, control of cell proliferation and apoptosis, chromatin compartmentalization, X-chromosome inactivation, prevention of 3-nucleotide-repeat expansions, and other chromatin-resident processes. After our original discovery of CTCF, it took us over 20 years of highly focused CTCF studies to persuade others that the multi-functionality of CTCF is indeed based on the ability of a highly conserved 'multivalent' eleven zinc fingers (aka "multivalent 11 ZF DBD") to recognize and bind a wide range of diverse DNA sequences on one hand and, on the other, to rely on the unusual intrinsic capacity of CTCF to interact with RNA and partner proteins through the combinatorial usage of individual DNA- and protein- contacting zinc fingers determined by specific context of a given target DNA site. So far, similar multivalency was shown for one other poly-ZF-array found in the DNA-binding Domain of Drosophila Su(Hw) factor. However, unlike universal CTCF, su(Hw) is an important insulator protein only in Drosophila because its gene has been totally lost upon vertebrate evolution. Moreover, with the advent of next generation sequencing techniques, CTCF binding sites have been identified within a growing number of animal species and, in particular, across numerous individual human genomes of different origin. Reflecting the multitude of CTCF functions, many thousands of non-homologous CTS sequences were found to be functionally associated with genomic regions engaged in long-range chromatin interactions, including enhancers, promoters, and intergenic boundary elements. It remained obscure, however, as to how a particular DNA sequence of a given CTCF target Site (aka "CTS") could be casually related to specific CTCF function(s) at the same site. Few years ago we have made one of the major advances towards understanding multiple functionality of distinct CTCF/DNA-complexes formed via different combinations of DNA-contacting fingers both in vitro and in vivo. By mapping simultaneous CTCF & BORIS occupancy genome-wide, we uncovered a new class of CTCF binding regions that are functionally pre-programmed and evolutionary conserved to serve as epigenetic marks for positioning two adjacent CTCF/BORIS-binding motifs [rather than a single one] at specific genomic coordinates. We found that approximately 70% of CTCF bound ChIP-Seq peak regions enclose a single CTCF-binding target site, aka "1xCTS" whereas other 30% of CTCF-binding regions detected by ChIP-seq as single peaks are, in fact, shown to contain the double or dual CTCF target sites, designated as binary "2xCTSes". Actual in situ occupancy of adjacent sites within binary 2xCTS regions constrains two adjacent self-interacting proteins to form either homogenous dimers (in normal somatic cells devoid of BORIS), or to assemble heterogenous dimers of paralogous CTCF and BORIS proteins co-bound to DNA at the same narrow genomic spot in germ and cancer cells co-expressing BORIS in addition to its ubiquitous partner, CTCF. The recent breakthrough discovery of 2xCTS-regions comprised of two adjacent CTCF motifs unresolvable by available peak-calling algorithms enabled for the first time to address the long-standing question as to how CTCF can serve in the context of the same nucleus as a bona fide transcription factor while maintaining a substantial simultaneous presence at putative insulator/boundary sites that bear no indications of transcriptional activity. Indeed, only 20% of all CTCF binding regions are located in promoter regions in any given cell type, while the remaining CTSes are not associated with transcriptional start sites. The obvious candidates for the determinants of such distinct functional roles would be DNA sequences themselves and/or differential identity of chromatin at these two types of sites. In our study we presented genome-wide evidence that DNA sequences underlying the two types of CTCF target sites are structurally different. The structural difference between two classes of CTCF binding sites is connected to their functional differences: 2xCTSes are preferentially located at H4K27ac-marked promoters and enhancers co-bound by Pol II, and the same 2xCTS elements are found to be associated with normal CTCF/BORIS heterodimers in post-meiotic spermatids wherein BORIS marks the future protamine-free DNA zones that retain modified histones along individual haploid epigenomes in mature human and mouse spermatozoa. In a stark contrast, intergenic and intronic genomic regions harboring one or more 1xCTS-based CTCF peaks with the name-giving 5'-CCC(C/t)CT(a/g)-3' motif which is often hit by a disease-associated SNP affecting three-dimensional organization imprinted upon essential self-interactions among sticky C-termini and DNA-free ZF-subsets from distal CTCF/DNA complexes engaged into site-specific di-/multi-merization stabilized by cohesin retention. A remarkable link with CTCF +/- haplo-insufficiency found in genetically burdened human subjects might open up a novel avenue in a clinically-oriented CTCF studies associated with aberrant histone/DNA-methylation encompassing CTCF-bound ChIP-Seq peaks with 2xCTS elements in H3K27ac-marked Pol2-bound promoter-enhancer pairs capable of altering gene expression in the same way that we had previously found to act in context of Ctcf+/- mice analyzed in our and in several other labs. Therefore, similar pathology-associated mechanisms seem to underlie both human and mouse genetic disorders caused by insufficient CTCF dosage exclusive of additional ZnF mutations which, even in tumors with 16q22/CTCF LOH, would cause a complete CTCF loss leading to death rather than a partial loss of DNA-CTCF interactions caused by in vivo selection of viable single a.a. substitutions within the multivalent 11 ZnF CTCF DBD that were characterized first in CTCF (1996) and found later on (2002) to be recapitulated in mammalian CTCF-derived paralogue named "BORIS" (an acronym for "Brother Of the Regular of Imprinted States"). Next, our discovery and further studies of the binary 2xCTS code begun to challenge a widespread misconception in the current literature claiming that all CTCF sites sharing a highly degenerate consensus motif bound by single CTCF molecule, are equivalent to each other. However, even a single CTCF target sequence with different genomic coordinates was proven to contain either one or two adjacent DNase I footprints over single or dual CTCF motifs without sufficient homologies necessary for reliable motif-based predictions. The functional and structural epigenetic features of Pol2-bound enhancer/promoter-associated 2xCTS-elements are distinct from the same features of 1xCTS-containing regions bound by CTCF-only monomers within intronic and intergenic non-coding regions. The previously overlooked class of CTCF binding regions with two (rather than one) closely spaced CTCF motifs (aka "2xCTS") has a very distinct role in regulating diverse chromatin-based phenomena, incl. heritable epigenetic regulation in cancer cells and in normal germ cells. For instance, non-random retention of sperm nucleosomes was found to be predetermined by specific nucleotide context of 2xCTS-containing regulatory DNA elements(aka "reg DNA words") which are normally co-bound by both CTCF & BORIS co-expressed together in late round spermatids. Moreover, our recent "Nature Communications" publication available online at https://www.nature.com/articles/s41467-021-24140-6 has described an unpredictable synergistic effect of combining Ctcf haplo-insufficiency with Boris-/- null genotype in the novel mouse DKO strain which revealed that CTCF+BORIS heterodimers are absolutely essential for spermatogenesis and fertility. Moreover, CTCF and the cohesin are widely recognized now as the key players in 3D genome architecture in all mammalian cells. These 2 proteins are not just well known in the scientific community but have recently entered popular press and media. Taken together, our data allowed to develop a global view of chromatin dynamics and provided unique resources for studying long-range epigenetic control of gene expression in distinct cell lineages. Finally, synergistic DKO effects led us towards a stunning understanding that only CTCF has been singled out by the Mother Nature to serve as a truly universal and irreversible epigenetic mark present on DNA not only in all somatic cell types but also in mature spermatozoa before and after fertilization upon endless reproduction rounds. Unlike other CTA selected for immunotherapy trials, only BORIS can re-activate all other known CTA genes. Hence, it's not surprising to see that experimental studies of Cancer-Testis Antigen BORIS encoded by paralogous CTCF-Like gene on chromosome 20q13 begun raising to the top rank of proprietary translational research.
View original record on NIH RePORTER →