Robust Accurate Identification of peptides from tandem m
National Library Of Medicine
Investigators
Linked publications & trials
Abstract
Proteomic research is among the most important ones in the post genomic era. [unreadable] Recent advances in tandem mass spectrometry(MS/MS) made promising the [unreadable] protein identification at large scale. The key to mass-spectrometry-based proteomics is [unreadable] peptide sequencing. There are in general two approaches to identify [unreadable] peptides from tandem mass spectrometry data: one is the library search method and [unreadable] the other is the de novo method. The major challenge in[unreadable] peptide sequencing, whether library search or de novo, is to better interpret statistical significance.[unreadable] [unreadable] Employing the scaling theory from statistical physics, we have developed a systematic [unreadable] method to address the issue of statistical significance assignment. A heuristic version[unreadable] of this statistical assignment is currently implemented in RAId, a coherent method [unreadable] developed by us to identify peptides from their associated tandem mass[unreadable] spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide[unreadable] library that we created. Because the noise in a spectrum depends on experimental conditions,[unreadable] the instrument used, and many other factors, it cannot be predicted even if the peptide sequence[unreadable] is known. The characteristics of the noise can only be uncovered once a spectrum is given.[unreadable] Through our de novo sequencing, we obtain the spectrum-specific background score statistics for[unreadable] the library search. When the database search fails to return significant hits, the top-ranking de[unreadable] novo sequences become candidates for new peptides that are not yet in the database. [unreadable] [unreadable] Although RAId has been shown to perform quite well when high-resolution spectra are used, [unreadable] it is not yet to our satisfaction in terms of its performance in dealing with low-resolution data.[unreadable] For the past year, it has been our goal to enable RAId to deal with such cases. We have developed[unreadable] an efficient algorithm to score all possible 4-letter tags covering both terminus of the peptide.[unreadable] For low resolution data, we currently implement the strategy of generating only de novo tags[unreadable] instead of the full sequence. This turns out to be very effective. We are currently investigating [unreadable] the possibility of incorporating post-translational modifications.
View original record on NIH RePORTER →