MTM 1: An explainable AI system for microbiome characterization and microbiome-based host-phenotype prediction

$500,000FY2020BIONSF

Indiana University, Bloomington IN

Investigators

Yuzhen Yecontact Haixu Tang Thomas G Doak

Abstract

Microbiome research is going through a revolutionary transition from characterizing reference microbiomes associated with different environments/hosts to translational applications, including using microbiome for disease diagnosis, improving the efficacy of cancer treatments, and prevention of diseases. The success of these translational applications relies on the identification of differential microbiome markers (e.g., species, genes and pathways) that can distinguish different groups of microbiome data (e.g., healthy individuals versus patients). It is also important to understand factors influencing the gut microbiome and strategies to manipulate the microbiome to augment therapeutic responses and disease prevention. Existing approaches for microbiome-based human host phenotype prediction typically lack explainability and they treat different diseases individually (even though it has been shown that some diseases share similar microbiome characteristics). This project aims to address these issues and develop an explainable AI system for microbiome-based phenotype predictions. The investigators will use the discoveries from this project in undergraduate and graduate teaching, and for outreach education through summer camps for high school students, providing them the opportunity to experience the entire process from sample collection to building microbiome classifiers. This project is to develop an explainable AI system for microbiome characterization and host phenotype prediction based on microbiome data. The proposed microbiome AI system relies on a network of human associated bacteria, to be inferred by integrating genome-scale metabolic modeling (of metabolic competition and complementarity between bacterial species) and co-occurrence profiling of microbial organisms across a large number of microbiome datasets. The AI system uses a conditional variational autoencoder guided by the inferred bacterial network to model the microbial abundances under various host conditions. The autoencoder is used to achieve efficient representation learning of a set of data in an unsupervised manner, and multitask learning is used to leverage the microbiome datasets associated with different diseases to alleviate the problem caused by limited training samples. Further the AI model will incorporate prediction of auxiliary phenotypes to regularize the representation learning. The AI system once trained can be used for microbiome-based host phenotype prediction, and provide explanations to the prediction through the model’s latent variables. It can also be used for predicting the impact of phenotypic alternation, an important problem to address in microbiome modulation and microbiome engineering. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →