A New Machine Learning Framework for Single-Cell Multi-Omics Bioinformatics
University Of Pittsburgh, Pittsburgh PA
Investigators
Abstract
Recent developments of single-cell omics technologies enable multi-modality measurements at genome, transcriptome, epigenome, or proteome scale, which will lead to unprecedented insight and resolution to fundamental biological processes. The project will construct a novel bioinformatics framework with advanced machine learning models, efficient computational tools, and user-friendly software for single-cell multi-omics data analysis. The outputs will be available online to the public and are expected to impact biological research community and empower scientists working on single-cell data to effectively test biological hypothesis, especially knowledge extraction from massive high-dimensional and complex datasets. The project will facilitate the development of novel educational tools to enhance curriculum design. Minority students and under-served populations will be engaged in cutting-edge research activities. The project focuses on designing principled machine learning and bioinformatics algorithms for analyzing large-scale single-cell multi-omics data to create toolkits to facilitate biological research. Specially, the research team will investigate 1) new cross-modal deep canonical correlation self-supervised autoencoder for multi-modal single-cell data integration, 2) new computational methods to study the associations of single-cell RNA-seq data and protein markers via semi-supervised deep neural networks, 3) interpretation algorithms to enhance predictive model via utilizing structure semantic information and identified biomarkers, 4) statistical inference framework for identifying and inferring conditional dependence from single-cell data, 5) novel transformer based variational autoencoder model for super-resolution spatial transcriptomics, 6) tool portal development for single-cell data analysis to advance biology research, and 7) validations of the proposed methods and system using real large-scale single-cell data. The project is innovative in integrating large-scale machine learning and data-intensive computing for single-cell bioinformatics and will hold great promise for biological mechanism understanding and biomedicine development. The results of the project can be found at: https://sites.pitt.edu/~heh45/NSF2225775.html This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →