Integrative framework for surface protein imputation for spatial biology of cancer

$408,829R21FY2025CANIH

University Of Pittsburgh At Pittsburgh, Pittsburgh PA

Investigators

Abstract

Project Summary Tumors are not only aggregates of malignant cells but also wellâorganized complex ecosystems. Spatial organization is strongly related to tumor development, recurrence, metastasis and treatment response. Spatial transcriptomics (ST) technologies measure genome-wide mRNA expression across thousands of spots on a tissue slice while preserving information about the location of spots and delineating these complex tumor ecosystems. Recent technological advances allow simultaneous profiling of genome-wide gene expression and multiplexed surface protein expression with histopathological evaluation (âspatial CITE-seqâ). Surface proteins are integral markers of specific cellular functions indicating cell states and serve as primary targets for therapeutic intervention. There is growing number of spatial transcriptomics data without surface protein measurements. Although machine learning (ML) approaches for cross-data imputation âpredicting missing or unobserved data based on available informationâ exist, they have not specifically applied to transfer knowledge from spatial CITE-seq to ST for protein imputation. The goal of the project is to increase utility of spatial transcriptomics datasets by imputing surface protein abundance through incorporation of spatial and/or histological dependencies and simultaneously learning the RNAâprotein mapping relation from spatial CITE- seq. Imputing surface protein expression from spatial transcriptomics data is significant because it enhances our understanding of tumor biology and cellular interactions within spatial domains. Surface proteins are vital for cell signaling and communication in the tumor microenvironment, and their accurate imputation allows researchers to bridge the gap between gene expression and protein function. This process facilitates the discovery of biomarkers for diagnosis and prognosis, informs targeted therapeutic strategies, and provides insights into how tumors evade immune detection. Our methods will be integrated into software packages for wide accessibility, and we will apply our computational framework to publicly available ST datasets to predict protein abundances. Web resources will feature a user-friendly interface for systematic data visualization, search, and download, and an interactive web application will be developed to allow researchers to input their own ST datasets and use pre-trained models to predict protein abundance. Our work addresses a crucial gap in analysis methods for spatial transcriptomics data and paves the way to better delineate tumor immune microenvironment for cancer research.

View original record on NIH RePORTER →