Interactive Informatics Resource for Research-driven Cancer Proteomics

$423,583U01FY2016CANIH

Battelle Pacific Northwest Laboratories, Richland WA

Investigators

Linked publications & trials

Paper 35220718 Paper 30638385 Paper 29092938 Paper 27696597 Paper 25931120 Paper 25855118 Paper 25433089

Abstract

DESCRIPTION (provided by applicant): In 2013 over 1.6 million new cases of cancer are expected to be diagnosed and over 580,000 people are expected to die of the disease. Thus, continued research in the identification of new diagnostic and prognostic biomarkers of cancer is necessary. Although cancer is widely recognized as a genomic disease, the directives of the DNA-based drivers are executed at the level of proteins and their biological functions, and the application of potential protein level biomarkers remains a compelling vision. Thus, a large investment has been made by NCI and other research centers in high-throughput global proteomics experiments to mine for novel biomarkers of cancer. However, few of these markers have come to fruition. We believe that one of the major challenges to the discovery of robust protein- or pathway-biomarker candidates from these large and complex proteomics datasets is due to naive data analysis approaches that do not take into account the underlying complexity of the proteome (e.g., splice variants, post- translational modifications). State-of-the-art statistical algorithms to improve the tasks of quality assessment, peptide and protein quantification, and pathway modeling that are designed to account for the design of the experiment have been developed; however access to these methodologies by the larger community is hindered since they are in the prototype stage and typically require knowledge of statistical programming. Furthermore, the likelihood of these tools moving to robust software is low since they are developed within the context of existing grants that do not support the transition from prototype to software. For the field of clinical proteomics to successfully identif new mechanistic etiologies of cancer requires not only high quality data with respect to the instrument, but also high quality statistical analysis of the data. This project proposes new informatics technology in the form of a robust, interactive and cross- platform software environment that will enable biomedical and biological scientists to perform in-depth analyses of global proteomics data from the point of quality assessment and normalization of raw inferred abundances (e.g., peak area) to the identification of protein biomarkers and enriched pathways. The software will be designed in a single programming language (Java) to assure easy installation across platforms with wizard-based data entry and advanced data reporting. Java will also support the development of advanced graphical user interfaces for data presentation and interactive graphics with a modern look and feel. This approach will ensure that scientists outside of the development institution can develop modules to include in the software or extensions for data integration without challenges of re-compiling the application. The software modules to be developed under this project are Aim 1) peptide and protein level quality assessment and quantification, Aim 2) protein biomarker discovery via exploratory data analysis and machine learning, and Aim 3) pathway biomarker discovery through integration with the NCI Protein Interaction Database.

View original record on NIH RePORTER →