Advancing data and metadata standards for proteomics mass spectra

$385,400R24FY2018GMNIH

Institute For Systems Biology, Seattle WA

Investigators

Linked publications & trials

Abstract

Project Summary Mass spectrometry (MS) based proteomics is currently the most widely used technology for the analysis of complex protein mixtures. It has the ability to detect and quantify the abundance of thousands of proteins and their variants, post-translational modifications, and interactions per experiment. There is a robust set of open, standardized data formats for encoding data and metadata from most stages of MS proteomics analysis, developed by the Proteomics Standards Initiative (PSI). However, there is not currently a standardized mechanism for universally referencing a spectrum that is used in an analysis or held up as evidence for a published claim. Further, despite the widely recognized significant advantages of spectrum matching approaches, an approved PSI standard for the storage and exchange of reference spectra in the form of spectral libraries is still glaringly absent. Here we propose a major advancement in data standards for proteomics mass spectra with the development of three interrelated standards. First, in order to solve the difficulty in identifying and accessing a specific spectrum in resources throughout the world, we will develop a universal spectrum identifier standard that can be widely used to reference, locate and access a specific spectrum. Second, building on PSI's extensive experience in developing official standard formats that are widely used, we will overhaul the current set of crude spectral library formats and develop a new standardized and comprehensive spectral library format that will be effective for the storage, use, and exchange of reference spectra. Third, we will develop a standard application programming interface that deploys the standards to the whole community by enabling users and automated software to query and exchange information about spectra, peptides, and proteins. These standards will be developed according the effective methodologies that the PSI has developed since its inception in 2002. This means that we will assemble the important stakeholders from all over the world to jointly develop the standards, create specification documents and examples. These specification documents then undergo the official PSI document process, which subjects each proposed standard to three rounds of iterative review and refinement. We will then develop open-source software that enables the use of these standards in multiple programming languages in order to promote widespread usage. Finally we will implement these standards via these software libraries at the three largest ProteomeXchange proteomics data repositories, which will ensure high visibility. The development of these three interrelated standards will achieve a substantial advance for the field of proteomics MS, and may well extend to MS-based metabolomics as well.

View original record on NIH RePORTER →