Bioinformatics, Machine Learning, Systems Biology of Cancers
Division Of Basic Sciences - Nci
Investigators
Linked publications & trials
Abstract
Rhabdomyosarcoma (RMS) remains the most prevalent pediatric soft tissue sarcoma, exhibiting substantial heterogeneity in biology and clinical outcomes. Despite advances in multimodal therapy, long-term survival remains poor for subsets of patients, particularly those with high-risk or fusion-positive alveolar RMS. The complexity of RMS subtypes and the limits of current risk stratification strategies underscore the urgent need for more precise, integrated, and predictive tools. Recent advances in artificial intelligence (AI), particularly deep learning models such as convolutional neural networks (CNNs) and transformers, coupled with the expansion of genomic data resources and digital pathology, offer a transformative opportunity. Our research aims to create a multimodal AI framework that combines histopathologic imaging (HandE), somatic and germline genomics, and clinical parameters to deliver robust, individualized prognostic assessments and inform therapeutic decisions in RMS. Progress to Date In our foundational work, we collected and digitized 321 whole-slide hematoxylin and eosin (HandE) images from RMS patients enrolled in Children's Oncology Group (COG) trials between 1998 and 2017. These samples, annotated with detailed clinical and molecular metadata, were processed into image patches and used to train CNNs to recognize features associated with mutation status and event-free survival (EFS). Key results include: CNN-based classification of alveolar RMS (ARMS), characterized by PAX3/7-FOXO1 fusions, achieved an area under the receiver operating characteristic curve (AUROC) of 0.85 on an independent set of 136 test samples. CNNs trained on mutation-annotated samples identified RAS pathway mutations with AUROC = 0.67, MYOD1 mutations with AUROC = 0.97, and TP53 mutations with AUROC = 0.63. CNN predictions of EFS and overall survival (OS) outperformed conventional clinical risk stratification metrics. This work, published under PMID: 36346688, validated the potential of deep learning to extract meaningful diagnostic and prognostic features directly from histologic images at the time of diagnosis. Next Phase: Integrated Multimodal Risk Prediction Engine We are now building a fully integrated RMS risk prediction engine that combines histologic, molecular, and clinical data into a comprehensive AI framework. This includes several innovations: 1. Enhancing CNNs with Large Language Models (LLMs): We are applying transformer-based LLMs adapted for visual tasks to augment CNN capabilities. These models enable better contextual analysis of spatial features in HandE slides and improve performance in mutation and outcome prediction. 2. Attention Heatmaps for Interpretability: We generate interpretable heatmaps using attention mechanisms to highlight regions of diagnostic importance on HandE images. This not only aids in model validation but also enables insights into morphological correlates of genetic alterations such as TP53 and MYOD1 mutations. 3. Multimodal AI Classifiers: We are developing ensemble AI classifiers that integrate: Clinical variables (e.g., age, stage, histologic subtype) Germline genetic risk variants (from Project 1, Aim 1) Somatic mutations (Project 1, Aim 2) CNN/LLM outputs on histopathology (Sub-aims 3a and 3b) Circulating tumor DNA (ctDNA) profiles (Project 2) Using Cox regression models, we assess time-to-event outcomes (EFS and OS), aiming to deliver an AI-derived risk score that surpasses traditional risk categories. Integrating Germline Genetics and Somatic Biomarkers The 2024 JAMA Network Open study (PMID: 38346688) revealed that RMS patients with germline cancer-predisposition variants (CPVs), particularly in TP53, had significantly worse survival. Importantly, embryonal RMS patients with CPVs had outcomes comparable to fusion-positive ARMS, highlighting the clinical relevance of incorporating germline data into risk modeling. Additionally, our work (JCO Precis Oncol 2025) integrating somatic biomarkers such as MYOD1, TP53, MET, NF1, CDKN2A, and MYCN mutations into a Gene-Enhanced (GE6) Cox model significantly improved survival predictions over clinical features alone. For example, 5-year EFS estimates shifted dramatically for certain genotypes, underscoring the individualized impact of somatic genomics. Functional Genomic Insights and Therapeutic Relevance Beyond prognostication, integrating AI with molecular insights has therapeutic implications. Recent work (Nature Communications 2024) identified selective inhibitors of the histone demethylase KDM3B, which suppress PAX3-FOXO1 activity in fusion-positive RMS. Our models may identify tumors with high KDM3B dependency based on morphologic and genomic features, aiding in stratification for targeted therapy trials. In parallel, dual-target CAR T-cell therapy targeting FGFR4 and CD276 demonstrated potent efficacy in preclinical RMS models (Nat Commun 2024). Our integrated model may assist in predicting antigen expression patterns and suitability for such immunotherapies. Cloud-Based Infrastructure and AI-Enhanced ClinOmics Portal The Oncogenomics Section's ClinOmics platform (https://clinomics.ccr.cancer.gov/), enhanced under the CCDI initiative, now hosts AI tools including our RMS CNN. We are transitioning to a cloud-native infrastructure on AWS, enabling large-scale data ingestion, real-time model inference, and user-friendly analytics. Our near-term goals include: Fully containerized deployment of the AI pipelines (Nextflow-based) for reproducibility Integration with CCDI and Kids First genomics via CAVATICA Real-time prediction dashboards within ClinOmics for clinical decision support Conclusion and Future Directions Through synergistic application of deep learning, multimodal integration, and high-performance cloud computing, we are revolutionizing RMS prognostication. The incorporation of CNNs, LLMs, germline/somatic genomics, and ctDNA analytics promises a paradigm shift in precision oncology for pediatric sarcomas. Our ultimate vision is a prospective clinical trial leveraging our multimodal AI RMS risk engine for stratified therapy-intensifying treatment for high-risk individuals while exploring de-escalation for favorable profiles. This framework, broadly adaptable to other pediatric cancers, exemplifies the power of interdisciplinary translational bioinformatics.
View original record on NIH RePORTER →