New methods for structural variation analysis of rearranged cancer genomes using

$383,639ZIAFY2023CANIH

Division Of Basic Sciences - Nci

Investigators

Linked publications & trials

Abstract

Most of the current long-read analysis tools for genome assembly and variant detection were not designed for tumor genomes, and fail to distinguish somatic variants and complex rearrangements in cancer genomes. In this sub-project, we aim to develop new fundamental algorithms and tools for long-read analysis of tumor genomes, that specifically address the issues of somatic variation calling, clonality, aneuploidy and complex rearrangements. We expect this project to result in multiple open source tools, freely available to the basic and clinical research community. The algorithmic developments will be informed by the analysis of specific biological collaborations described in sub-project 2. Access to high-quality long-read tumor sequencing data is critical for new algorithmic developments. In Aim 1.1, we will collaborate with multiple intramural and extramural researchers and sequencing cores to generate various different sequencing datasets, spanning different cancer and sample types. The current plans include sequencing of matching tumor/normal cell lines, head and neck and cervical cancers, osteosarcoma, melanoma and pediatric leukemia tumors. In Aim 1.2, we will develop a new algorithm for detecting somatic rearrangements in matching tumor-normal long-read sequencing, as well as multi-site and time series samples. We use a genomic graph approach to cluster breakpoints from complex events, such as chromothripsis or breakage-fusion-bridge. Other algorithmic challenges include matching VNTR indels and detection of collapsed segmental duplications. Further, in Aim 1.3, we will develop a method for reconstructing cancer kariotypes in presence of aneuploidy large copy number alterations (CNAs). Importantly, the algorithm will reconstruct haplotype-specific coverage profiles, taking advantage of the direct variant phasing using long reads. The implementation will rely on various unsupervised probabilistic models, such as Hidden Markov Models (HMM) or Gaussian Mixture Models.

View original record on NIH RePORTER →