Cloud-Based Machine Learning and Biomarker Visual Analytics for Salivary Proteomics
Ada Forsyth Institute, Inc., Cambridge MA
Investigators
Linked publications & trials
Abstract
Title: Cloud-Based Machine Learning and Biomarker Visual Analytics for Salivary Proteomics PROJECT SUMMARY Cloud computing and open-access data platforms provide essential and critical bioinformatics resources for data analysis and knowledge discovery to support data-intensive and data-driven research such as proteomics. The parent grant for this supplement proposal, DE016937, âA Foundation for the Oral Microbiome and Metagenome,â has the goal of providing curated âomic resources to help the scientific community understand how microbe-microbe and microbe-host interactions in the oral cavity affect human health and disease. Saliva is a biofluid with complex composition and function that has the potential to allow the medical community broadly to diagnose and treat disease. This is a proposal to test the utility of cloud-based resources to improve understanding of saliva and its associated large proteomic datasets. This information will integrate into the parent grant via the Human Salivary Proteome Wiki - HSP Wiki â which was developed to aggregate and curate proteome information from human saliva. Currently, this database allows members of the oral and biomedical research communities to explore various proteomic datasets, with limited visualization tools for salivary composition, interactions, and initial clustering. While our HSP Wiki pipeline allows users to interrogate the data by surveying health and disease patterns, there are gaps in salivary biomarker discovery, including lack of visual analytic tools and tools to study protein-protein interactions. In addition, most cutting-edge machine learning (ML) methods have not been applied to proteomic data analysis, including of the salivary proteome. The goal of this proof-of-concept project is to develop cloud-based tools to enhance the current HSP Wiki visual analytics, implement ML and improve our interpretation of salivary derived proteins towards more efficient and reliable salivary biomarker discovery. With this purpose, the first aim is to develop a cloud-based visual analytics platform to support machine learning identification of salivary proteomic biomarkers. We will test and quantify how parallel computing and machine learning can effectively and efficiently process large salivary proteomic datasets for novel biomarker discovery when compared to conventional desktop tools. The second aim is to develop a scalable and on-demand cloud pipeline to implement protein prediction using AlphaFold2, and a comparison tool for 3D models using MolStar. The expected outcome will be to provide the research community with enhanced tools and online resources to better discover biomarkers in health and disease and for both host and microbial proteins. We will quantify the efficiency of these tools and their effectiveness in generating new discoveries of protein structure. The cloud-based interface, visual analytical tools, informatic pipelines, and parallel computing will be broadly disseminated to catalyze discovery, enhancing rigor and transparency in proteomic research and translation applications to salivary biomarkers.
View original record on NIH RePORTER →