GGrantIndex
← Search

An Accurate Machine Learning Framework for Childhood Acute Myeloid Leukemia Subtype Identification by Integrating Bulk and Single-Cell Multi-Omics Data Within and Beyond the CCDI Ecosystem

$500,000P30FY2023CANIH

University Of Nebraska Medical Center, Omaha NE

Investigators

Linked publications & trials

Abstract

Abstract As a fatal childhood hematopoietic malignancy characterized by clonal expansion of immature myeloid precursors, acute myeloid leukemia (AML) usually leads to bone marrow failure and impaired hematopoiesis. AML has multiple distinct subtypes characterized by morphological, molecular, and genetic alterations. Identifying AML subtypes can facilitate downstream risk stratification and tailored treatment design. While various conventional methods like morphological analysis, cytogenetic analysis, immunophenotyping, or molecular profiling have been used for AML subtype identification, they are usually costly, time-consuming, labor-intensive, and sometimes inaccurate. Recent progress has witnessed the application of next generation sequencing (NGS) for identifying AML subtypes, but they are limited to bulk NGS data, or single omics data only. With tons of omics data being generated within and beyond the Childhood Cancer Data Initiative (CCDI) ecosystem, we hypothesize that integration of single-cell and bulk multi-omics data including genomics, transcriptomics, and epigenetics data will significantly facilitate subtype-specific biomarker discovery and boost the accuracy of AML subtype identification. Under our parent award (CA036727), in this supplemental project, we propose to develop an integrated machine learning (ML) framework for accurate and cost-effective AML subtype identification by combining bulk and single-cell multi-omics data within and beyond CCDI ecosystem. To achieve this, we plan to undertake two specific aims. In Aim 1, we will establish a knowledge-transfer ML model that leverages large-scale bulk and single-cell transcriptomics data for AML subtype identification. Besides identifying well-annotated AML subtypes, we will also explore novel AML subtypes by detecting rare cell types from large-scale single cell data, from which cluster-specific and rare-cell-type specific gene signatures can be transferred to the bulk transcriptomics data for improving performance of AML subtype identification. In Aim 2, we will develop a multi-kernel learning and a multi-modal deep learning framework to systematically and automatically integrate deep information related with AML subtypes from single-cell and bulk multi-omics data (including genomics, transcriptomics, epigenomics) to further boost AML subtype identification. Our model is flexible to tackle cases when only partial or incomplete multi-omics data are available for new patients. We believe successful completion of this study will have direct impacts on improving downstream childhood AML risk stratification, facilitating diagnosis and prognosis, and optimizing treatment selection. We also expect that our proposed framework in this study can be customized and extensible to identifying subtypes of other pediatric, adolescent, and young adult (AYA) cancers especially ultra-rare tumors.

View original record on NIH RePORTER →