Deciphering CTCF code in mammalian host and viral epigenomes
National Institute Of Allergy And Infectious Diseases
Investigators
Linked publications & trials
Abstract
CTCF, a highly conserved DNA binding protein, serves as a global organizer of chromatin architecture. CTCF is involved in regulation of transcriptional activation and repression, gene imprinting, control of cell proliferation and apoptosis, chromatin compartmentali-zation, X-chromosome inactivation, prevention of tri-nucleotide-repeat's expansions, and other chromatin-resident processes. It took us over 20 years of CTCF studies to persuade others that the multi-functionality of CTCF is indeed based on the ability of a highly-conserved 'multivalent 11 ZF DBD to bind a wide range of diverse DNA sequences, as well as on its intrinsic capacity to interact with a partner-proteins through the combinatorial usage of DNA-contating and protein-contacting ZFs. Last year, a similar multivalency was shown for another poly-ZF DBD array in the Drosophila Su(Hw) factor. With the advent of next generation sequencing techniques, CTCF binding sites have been identified across fly, mouse, and human genomes. Reflecting the multitude of CTCF functions, many thousands of non-homologous CTS sequences were found to be associated with genomic regions engaged in long-range chromatin interactions, including enhancers, promoters, and inter-genic boundary elements. It remained obscure, however, as to how a particular DNA sequence of any given CTS is related to specific CTCF functions at the same site. This year, we have made additional advances in the direction of understanding multiple functionality of distinct CTCF/DNA-complexes formed via different combinations of DNA-contacting fingers. By mapping simultaneous CTCF & BORIS occupancy genome-wide, we uncovered two classes of CTCF binding regions that are pre-programmed and evolutionary conserved in DNA sequence. We found that 70% of CTCF bound regions enclose a single CTCF binding site, aka 1xCTSes while other 30% of CTCF-binding regions detected by ChIP-seq as single peaks are, in fact, shown to contain the dual CTCF binding sites, aka binary 2xCTSes. Occupancy of adjacent CTSes within binary 2xCTS-regions constrains 2 adjacent CTCF proteins to form homodimers in normal somatic cells, or to assemble heterodimers of CTCF+ BORIS co-bound at the same DNA spot in germ and cancer cells co-expressing BORIS on top of CTCF. The recent breakthrough discovery of 2xCTS-regions (unresolvable by a standard CTCF-specific ChIP-Seq) enabled us, for the first time, to address the long-standing question as to how CTCF can serve in the context of the same nucleus as a bona fide transcription factor, while maintaining a substantial presence at putative insulator/boundary sites that bear no indications of transcriptional activity. Indeed, only 20% of all CTCF binding regions are located in promoter regions in any given cell type, while the remaining CTSes are not associated with transcriptional start sites. The obvious candidates for the determinants of such distinct functional roles would be DNA sequences themselves and/or differential identity of chromatin at these two types of sites. In our study we presented genome-wide evidence that DNA sequences underlying the two types of CTCF target sites are structurally different. The structural difference between two classes of CTCF binding sites is connected to their functional differences: 2xCTSes are preferentially located at H4K27ac-marked promoters and enhancers co-bound by Pol II, and the same 2xCTS elements are found to be associated with normal CTCF-BORIS-heterodimers in post-meiotic spermatids wherein BORIS marks the future protamine-free DNA zones that retain modified histones along haploid epi-genome in mature human and mouse spermatozoa. In a stark contrast, intergenic and intronic genomic regions harboring one or more 1xCTS-based CTCF peaks with the name-giving 5'-CCC(C/t)CT(a/g)-3' motif which is often hit by a disease-associated SNP affecting three-dimensional organization imprinted upon essential self-interactions among sticky C-termini and DNA-free ZF-subsets from distal CTCF/DNA complexes engaged into site-specific di-/multi-merization stabilized by cohesin retention. A remarkable link with CTCF +/- haplo-insufficiency found in genetically burdened human subjects might open up a novel avenenue in a clinically-oriented CTCF studies associated with aberrant histone/DNA-methylation encompassing CTCF-bound ChIP-Seq peaks with 2xCTS elements in H3K27ac-marked Pol2-bound promoter-enhancer pairs capable of altering gene expression in the same way that we had previously found to act in context of Ctcf+/- mice analyzed in collaboration with Fred Hutchinson Cancer Center in Seattle. Therefore, similar pathology-associated mechanisms seem to underlie both human and mouse genetic disorders caused by insufficient CTCF dosage exclusive of additional ZnF mutations which, even in tumors with 16q22/CTCF LOH, would cause a complete CTCF loss leading to death rather than a partial loss of DNA-CTCF interactions caused by in vivo selection of viable single a.a. substitutions within the multivalent 11 ZnF CTCF DBD that were characterized first in CTCF (1996) and found later on (2002) to be recapitulated in the CTCF-derived paralog named BORIS (an acronym for Brother Of the Regular of Imprinted States). Next, our discovery and further studies of the binary 2xCTS code begun to challenge a widespread misconception in the current literature claiming that all CTCF sites are equivalent to each other, with a single CTCF molecule bound at a single CTS sequence in spite of the fact that CTS elements with different genomic coordinates may contain either one or two adjacent DNase I footprints over single or dual CTCF motifs without any homologies necessary for reliable motif-based predictions. The functional and structural epigenetic features of Pol2-bound enhancer/promoter-associated 2xCTS-elements are distinct from the same features of 1xCTS-containing regions bound by CTCF-only monomers within intronic and inter-genic non-coding regions. The previously overlooked class of CTCF binding regions with two (rather than one) closely-spaced CTCF motifs (aka 2xCTS) has a very distinct role in regulating diverse chromatin-based phenomena, including heritable epigenetic regulation in cancer cells and in normal germ cells. For instance, non-random retention of sperm nucleosomes was found to be predetermined by specific nt context of 2xCTS-containing reg.DNA elements that are normally co-bound by both CTCF & BORIS 11 ZF paralogs co-expressed together in post-meiotic round spermatids. Moreover, our latest Nature Communications paper available online at https://www.nature.com/articles/s41467-021-24140-6 has described in detail a totally unpredictable synergistic effect of combining Ctcf haploinsufficiency with Boris-/- null genotype in the novel mouse DKO strain which revealed that combined action of CTCF & BORIS heterodimers is absolutely essential for spermatogenesis and fertility. Moreover, CTCF and the cohesin are widely recognized now as the key players in 3D genome architecture in all mammalian cells. These 2 proteins are not just well known in the scientific community but have recently entered the popular press such as Scientific American magazine at https://www.scientificamerican.com/article/untangling-the-formation-of-dna-loops. Taken together, our results allowed a global view of chromatin dynamics and provided unique resources for studying long-range epigenetic control of gene expression in distinct cell lineages. Finally, synergistic DKO effects led us towards a stunning breakthrough understanding that only CTCF has been singled out by the Mother Nature to serve in BORIS+mammals as a truly universal and even irreversible epigenetic mark present on DNA not only in all somatic cell types but also in mature spermatozoa before and after fertilization throughout endless species reproduction rounds.
View original record on NIH RePORTER →