GGrantIndex
← Search

Creating AI/ML-ready data for single cell proteomics

$293,679R01FY2023GMNIH

Brigham Young University, Provo UT

Investigators

Linked publications & trials

Abstract

Project Summary: Supplement to 1R01GM147653-01 Brief parent R01 summary Life is the result of dynamic interactions that occur within and among individual cells. Single cell analyses characterize a sample’s diversity and each individual cell’s state and ability to respond to the environment. Single cell proteomics (SCP) is rapidly emerging and can quantify > 1000 proteins per cell, a level of coverage sufficient to categorize cell types and reveal characteristic cellular functions. Significant advances in instrumentation and sample preparation are making SCP more broadly accessible. Yet advances in data acquisition have not been paired with advances to computational tools. The parent award creates algorithms specifically optimized for the unique nature of SCP data, and will radically improve accuracy and coverage of the single cell proteome. The aims of the parent award address algorithmic challenges in spectrum identification (Aim 1) and protein quantification (Aim 2). The project also creates some single cell proteomics data for benchmarking purposes (Aim 3). Goals of the Supplement The Supplement proposal is a collaboration between the MPIs of the parent award and experts in machine learning. The goal of the supplement is to improve the AI/ML readiness of single cell proteomics data through two primary tasks. First, we will define and implement file formats that are more amenable to machine learning than the formats for proteomics data and results, as current formats are bloated and insufficient and cannot scale to the necessary level of data required for machine learning. Second, we will improve software tools for the capture of meta-data, which is essential for describing the experimental design. With the successful completion of these two tasks, we will demonstrate that data from single cell proteomics experiments are immediately usable for a variety of machine learning tasks available at proteomicsML.org.

View original record on NIH RePORTER →