An Explainable Machine Learning Platform for Single Cell Data Analysis
University Of Virginia Main Campus, Charlottesville VA
Investigators
Abstract
The rapid advances in single-cell RNA sequencing technologies have enabled us to capture gene signatures within the fundamental units of life, single cells. This enables discovery and characterization of cell types including novel ones in multicellular organisms; cell-cell communication and the complex interactions between various cell types in tissues; spatially resolved mapping of organs at the single cell level; and identification of genes and pathways in specific cell types of an organism affected in different contexts. The project introduces a set of novel and explainable machine learning approaches tailored to single-cell data analysis. A platform for explainable machine learning will be developed capable of supporting advanced analysis of single-cell RNA sequencing data and making explainable predictions to directly link phenotypes with genes and pathways in specific cell types. The approaches from this project are translational to any tabular datasets with low-sample-size or many variables, which are prevalent in biological research. The project’s single-cell sequencing data analysis tools, results, and generated data will be excellent exemplars of research projects for exposing undergraduates, graduates, women, and minority students to the development and application of explainable machine learning approaches to advance our understanding of biology at the single cell level. Moreover, development and application of powerful machine leaning tools that yield interpretable results from complex biological datasets and broad accessibility to these methods, tools and results will maximize their value, accelerate biological discovery and advance our understanding of cellular and molecular biology. Specifically, the project will develop a platform of novel machine learning approaches to generate disentangled representations of cells and genes in latent spaces for single-cell RNA sequencing data, which can be used to make explainable predictions of phenotypes. The project contains three synergistic tasks: (1) develop explainable cell-prototype-based approaches to single-cell RNA sequencing data analysis, (2) develop explainable concept-based machine learning models for single-cell RNA sequencing data analysis, and (3) develop machine learning methods to generate representative single cell expression data from bulk RNA sequencing data. Through these tasks, the project develops algorithms, models and tools that enable the full power of state-of-the art explainable machine learning to be applied to single cell data analysis and phenotype prediction at single cell resolution. The algorithms and tools produced by this project will be broadly applicable to predicting genes and pathways at the single cell level in different organisms. The platform can significantly enhance the utility of single-cell data, and assist researchers in analysis of their own datasets. The results of the project can be found at: https://www.cs.virginia.edu/~az9eg/website/projects.html. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →