GGrantIndex
← Search

Comparative Genomics Analysis Unit Research

$1,512,593ZIAFY2022HGNIH

National Human Genome Research Institute

Investigators

Linked publications & trials

Abstract

Bioinformatics Developments In 2022, the CGAU continued to work on software for analyzing large insertions, deletions, and inversions in genomes, also known as structural variants, or SVs. A new project involved building machine learning models for the detection of structural variants using short read data, for which the unit created a snakemake pipeline, simSV, to simulate short, paired read data from altered haplotypes. The unit is currently using these simulated reads to train neural networks and random forest classifiers to recognize hard-to-detect deletions in the alpha thalassemia region on human chromosome 16. In addition, the CGAU continues to maintain and improve its SVanalyzer toolkit for the generation and analysis of structural variant calls. SVanalyzer is unique in its ability to reconcile different representations of structural variants in highly repetitive regions of the genome, which is a critical capability for analyzing highly complete genomes like the new T2T reference. This year, the unit used SVanalyzer to characterize the length distribution of similar sequence surrounding genomic insertions and deletions in genomes, further characterizing human genome architecture. Collaborative Work The publication of a complete, telomere-to-telomere human genome reference in 2022 marked a major milestone by assembling all human chromosomes (1-22 and X) of the CHM13 cell line in their entirety, providing future genomic analyses a complete sequence substrate to build upon. The CGAU participated in the T2T consortium, led by Dr. Adam Phillippy of NHGRI, as well as its variant calling workgroup, and performed analyses to examine how the new T2T reference improves our ability to detect, describe, and genotype variation in human populations. (Nurk, Koren et al. 2022), (Aganezov, Yan et al. 2022) In collaboration with Dr. Phillippy and Dr. Chirag Jain of the Indian Institute of Science, the unit also helped to evaluate Dr. Jains Winnowmap2 long read aligner. This new version of Winnowmap enables more effective detection of structural variants in repetitive regions of the genome by finding contiguous, high-confidence alignments in the regions surrounding them. By anchoring alignments in these minimal confidently alignable substrings, or MCASs, Winnowmap2 chooses correctly which of multiple paralogous regions to align to, even in the presence of large deletions or insertions. (Jain, Rhie et al. 2022) In collaboration with the Reproductive Cancer Genetics Section, led by Dr. Daphne Bell, the CGAU published a study identifying KLF3 and PAX6 as candidate driver genes in late-stage endometrial cancers. By designing a study that performed early-stage tumor tests only on genes already determined to be significantly mutated in late-stage endometrial tumors, it was possible to demonstrate that there was sufficient statistical power to highlight KLF3 and PAX6 as significantly mutated in late-stage, but not in early-stage, endometrial tumors. These results showed that KLF3 and PAX6, which this study also found to be MSI target genes, are potentially associated with disease progression in endometrial cancer. (Rudd, Hansen et al. 2022)

View original record on NIH RePORTER →