DNA methylation-based machine classification of cancer
Division Of Basic Sciences - Nci
Investigators
Linked publications & trials
Abstract
Building on preliminary data I have generated from publicly-accessible DNA methylation data and in combination with data generated through the NCI/LP, I will develop organ-specific methylation bioinformatic pipelines and machine learning-based classifiers, using the previously described methodology. Additionally, I will perform a large-scale combined analysis to assess the feasibility of a Pan-cancer DNA methylation classifier. Using publicly-available data (DNA methylation, copy number, RNA-seq) and ongoing transcriptome sequencing through the LP, I am working to identify pathways through novel integrated bioinformatic techniques. The signal intensities from DNA methylation array data can be used to infer copy number data, chromatin accessibility (Hi-C A/B compartments), and methylation of repetitive sequences such as Alu and LINE-1 elements. I will computationally integrate these independent data types in order to identify new tumor types/subtypes and potentially refine current classification schemes; this approach will likely reveal functionally-relevant subtypes since DNA methylation, gene transcription, and chromatin accessibility are tightly linked. In order to achieve this, I will apply novel integrative techniques such as density-aware spectral clustering, similarity network fusion (SNF), and Bayesian consensus clustering. Additionally, I will employ a recently described unsupervised factor analysis approach (multi-omics factor analysis, MOFA) to identify shared and distinct sources of variability among CNS tumors. This approach will also serve to identify latent factors that drive the variability in the data and reveal potentially targetable pathways.
View original record on NIH RePORTER →