Genome-wide hypermutation and structural instability

$1,804,408ZIAFY2022ESNIH

National Institute Of Environmental Health Sciences

Investigators

Linked publications & trials

Abstract

Purpose or scope: A role for somatic mutations in carcinogenesis and genetic disease is well accepted, but the degree to which mutation rates influence cancer initiation and development is under continuous debate. Recently accumulated genomic data has revealed that thousands of tumor samples are riddled by hypermutation, broadening support that many cancers acquire a mutator phenotype. This major expansion of cancer mutation datasets has provided unprecedented statistical power for the analysis of mutation spectra, which has confirmed several classical sources of mutation in cancer, highlighted new prominent mutation sources and empowered the search for cancer drivers. In our work we combined mechanistic knowledge obtained through our experiments with yeast models to interrogate the large whole-genome datasets of cancer mutations in order to gain mechanistic insight for understanding the impact of mutations on cancer and genetic disease. Research subject: The optimal levels of genome instability needed to sustain fitness of an organism are maintained by a complex set of DNA metabolic functions and pathways. Understanding the interplay between the biological mechanisms maintaining a stable genome and the environmental factors promoting genome instability is important for improving policies pertaining to the impact of the environment on human health. My long-term interest is in understanding physiological mechanisms and environmental causes of extreme levels of genome instability that can give rise to diseases and may alter the life-span of organisms. During the reviewed period, me and my group addressed these questions by combining the following general approaches: (i) Gaining new mechanistic information through research in yeast models reporter based and whole-genome sequencing. This approach elucidates mechanisms of genome instability and defines their specific features. (ii) Using mechanistic knowledge acquired from small genome studies for designing analyses of publicly available large datasets of genome changes in human cancers. Knowledge acquired from mechanistic research in yeast allows to build stringent statistical hypotheses thereby increasing the statistical power in bioinformatic interrogation of the exponentially growing datasets of cancer genomics such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). (iii) Assessing load and signatures of somatic genome changes in humans. Analytical pipeline and information about mutation signatures generated through interrogation of cancer genomics data sets are applied to whole-genome sequencing analyses of cells isolated from healthy individuals. The combination of approaches (i) and (iii) provides additional research opportunities by way of using new knowledge generated through bioinformatic analysis of large public datasets and through sequencing genomes of human subjects for developing the next level of mechanistic hypotheses testable via small genome systems. (iv) Changes in RNA sequences occur through the life of an organism and through generations. These variations are associated with differential exposure to endogenous and environmental damaging agents acting on DNA genomes of cellular organisms as well as on genomes of DNA and RNA viruses. Over the last several years my group was studying only DNA mutations. However, recent studies, including our own, revealed that changes in RNA genomes and in RNA-editing of non-replicating cellular RNAs can result from the same agents that act on DNAs. Considering importance of factors affecting stability of viral RNA genomes and generation of cellular RNA editome, Therefore we extended experimental and bioinformatics tools already developed by my group in DNA research to studying induced changes in RNA sequences. Accomplishments: Genomes of tens of thousands of SARS-CoV2 isolates have been sequenced across the world and the total number of changes (predominantly single base substitutions) in these isolates exceeds ten thousand. We compared the mutational spectrum in the new SARS-CoV-2 mutation dataset with the previously published mutation spectrum in hypermutated genomes of rubella - another positive single stranded (ss) RNA virus. Each of the rubella virus isolates arose by accumulation of hundreds of mutations during propagation in a single subject, while SARS-CoV-2 mutation spectrum represents a collection events in multiple virus isolates from individuals across the world. We found a clear similarity between the spectra of single base substitutions in rubella and in SARS-CoV-2, with C to U as well as A to G and U to C being the most prominent in plus strand genomic RNA of each virus. Of those, U to C changes universally showed preference for loops versus stems in predicted RNA secondary structure. Similarly, to what was previously reported for rubella virus, C to U changes showed enrichment in the uCn motif, which suggested a subclass of APOBEC cytidine deaminase being a source of these substitutions. We also found enrichment of several other trinucleotide-centered mutation motifs only in SARS-CoV-2 - likely indicative of a mutation process characteristic to this virus. Altogether, the results of this analysis suggest that the mutation mechanisms that lead to hypermutation of the rubella vaccine virus in a rare pathological condition may also operate in the background of the SARS-CoV-2 viruses currently propagating in the human population. Human skin is continuously exposed to environmental DNA damage leading to the accumulation of somatic mutations over the lifetime of an individual. Mutagenesis in human skin cells can be also caused by endogenous DNA damage and by DNA replication errors. The contributions of these processes to the somatic mutation load in the skin of healthy humans has so far not been accurately assessed because the low numbers of mutations from current sequencing methodologies preclude the distinction between sequencing errors and true somatic genome changes. In this work, we sequenced genomes of single cell-derived clonal lineages obtained from primary skin cells of a large cohort of healthy individuals across a wide range of ages. We report here the range of mutation load and a comprehensive view of the various somatic genome changes that accumulate in skin cells. We demonstrate that UV-induced base substitutions, insertions and deletions are prominent even in sun-shielded skin. In addition, we detect accumulation of mutations due to spontaneous deamination of methylated cytosines as well as insertions and deletions characteristic of DNA replication errors in these cells. The endogenously induced somatic mutations and indels also demonstrate a linear increase with age, while UV-induced mutation load is age-independent. Finally, we show that DNA replication stalling at common fragile sites are potent sources of gross chromosomal rearrangements in human cells. Thus, somatic mutations in skin of healthy individuals reflect the interplay of environmental and endogenous factors in facilitating genome instability and carcinogenesis.

View original record on NIH RePORTER →