Robust Accurate Identification of peptides from tandem m

$0Z01FY2006LMNIH

National Library Of Medicine

Investigators

Linked publications & trials

Paper 19918268 Paper 18954448 Paper 18597684 Paper 18558733 Paper 17983478 Paper 17961253 Paper 16105903 Paper 16090285 Paper 15548452 Paper 15509610

Abstract

Proteomic research is among the most important ones in the post genomic era. [unreadable] Recent advances in tandem mass spectrometry(MS/MS) made promising the [unreadable] protein identification at large scale. The key to mass-spectrometry-based proteomics is [unreadable] peptide sequencing. There are in general two approaches to identify [unreadable] peptides from tandem mass spectrometry data: one is the library search method and [unreadable] the other is the de novo method. The major challenge in[unreadable] peptide sequencing, whether library search or de novo, is to better interpret statistical significance.[unreadable] [unreadable] Employing the scaling theory from statistical physics, we have developed a systematic [unreadable] method to address the issue of statistical significance assignment. A heuristic version[unreadable] of this statistical assignment is currently implemented in RAId, a coherent method [unreadable] developed by us to identify peptides from their associated tandem mass[unreadable] spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide[unreadable] library that we created. Because the noise in a spectrum depends on experimental conditions,[unreadable] the instrument used, and many other factors, it cannot be predicted even if the peptide sequence[unreadable] is known. The characteristics of the noise can only be uncovered once a spectrum is given.[unreadable] Through our de novo sequencing, we obtain the spectrum-specific background score statistics for[unreadable] the library search. When the database search fails to return significant hits, the top-ranking de[unreadable] novo sequences become candidates for new peptides that are not yet in the database. [unreadable] [unreadable] Although RAId has been shown to perform quite well when high-resolution spectra are used, [unreadable] it is not yet to our satisfaction in terms of its performance in dealing with low-resolution data.[unreadable] For the past year, it has been our goal to enable RAId to deal with such cases. We have developed[unreadable] an efficient algorithm to score all possible 4-letter tags covering both terminus of the peptide.[unreadable] For low resolution data, we currently implement the strategy of generating only de novo tags[unreadable] instead of the full sequence. This turns out to be very effective. We are currently investigating [unreadable] the possibility of incorporating post-translational modifications.

View original record on NIH RePORTER →