Statistical peak detection, adaptive classification and protein-protein network construction using mass spectra

$149,391FY2008MPSNSF

University Of Louisville Research Foundation Inc, Louisville KY

Investigators

Abstract

The main goal of this proposal is to develop novel and improved statistical methods for analyzing high dimensional proteomic data generated from mass spectrometers. These data usually consist of spectra each with thousands of features. However, these features contain true signals of proteins/peptides and noises. This proposal focuses on three interconnecting (and sequential) goals: 1) separating the true peaks from chemical noise in a mass spectrum using statistical modeling and hypotheses test, 2) comprehensive evaluation and aggregate ranking of a number of classification techniques to classify the case and control samples using proteomic profiles and construction of an adaptive classifier which is expected to perform better than individual classifiers under an ensemble of performance measures and 3) construction of a protein-protein association network from the truly classifying peaks in a case-control study by reverse engineering. An overall and ultimate goal of this proposed research is to study the performance of the three pieces put together in a sequential manner to understand the inner working of proteins in a case-control study based on mass spectrometry data. High throughput proteomic profiling using mass spectrometry measurements have enormous potential in scientific/biomedical research. Identification of proteomic biomarkers for complex diseases and conditions like cancer, acute renal disorder and fetal alcohol syndrome etc. from easily available bodily fluids like blood, plasma, urine, amniotic fluid and serum could be very beneficial. These biomarkers are expected to be much more sensitive and specific than the existing ones and hence are better in terms of early detection and prevention of such diseases and conditions. Proteomic signature profiling also can be used to quickly identify different biological agents (as for example, anthrax). This particular application demonstrates its implication in the matters related to homeland security. Similarly, proteomic profiling of bodily fluids of subjects exposed to different environmental toxins can also be useful. However, complexity of these data poses new statistical challenges for their analysis. Hence proper analytic tools are much needed for the proper utilization of these data. The proposed research is expected to make significant contribution towards this relatively new area of research. Last but not the least, the analytical and computational tools developed for this project can be used to analyze other types of high dimensional data.

View original record on NSF Award Search →