Data Organizing Core

$305,491U54FY2016CANIH

University Of New Mexico Health Scis Ctr, Albuquerque NM

Investigators

Linked publications & trials

Abstract

The main goal of the Data Organizing Core, DOC, of the Illuminating the Druggable Genome Knowledge Management Center (IDG KMC) is to evaluate, organize and rank all prospective disease-linked proteins for four protein superfamilies: G-protein-coupled receptors (GPCRs), nuclear receptors (NRs), ion channels (IC) and kinases. As main knowledge repository, the DOC will develop the Target Central Resource Database (TCRD) by combining data extracted from multiple sources linking disease, pathway, protein, chemical, gene, bioactivity, drug discovery and clinical information elements from databases, literature, patents, drug labels and other documents. TCRD will serve as central source for the IDG Query Platform, which is developed by KMC's User Interface Portal (UIP) core. DOC will develop tools for algorithmic processing and prediction, which will improve disease-protein associations supported by human curation. Four External Target Panels will curate emerging associations, ranking appropriate proteins. DOC will stratify proteins into 4 classes (Tclin - clinical; Tchem - manipulated by chemicals; Tmacro - manipulated by macromolecules; and Tdark - the genomic dark matter), supported by tissue and cellular localization data for proteins (TTL) and diseases. Oprea at UNM will lead the DOC, supported by team leaders Brunak and Jensen (at Center for Protein Research, Denmark), Overington (European Bioinformatics Institute) and Schurer (University of Miami), respectively. Specific Aims: 1. Develop tools for the automated extraction and processing of data, deposited into TCRD; 2. Develop tools for the semi-automated data extraction for pathways, diseases and associated ontologies, which will support TTL stratification; 3. Develop tools for expert curation of literature and patent data, approved drug labels and clinical trials; 4. Develop analytics, modeling and visualization tools for disease-based target prioritization. Preliminary stratification (e.g., Tclin 22%, Tdark 30%) of disease-protein associations was performed for each protein superfamily, using automated tools. Within 12 months, the TCRD-based IDG Querly Platform will be operational, improving target prioritization for the research community at large and the IDG Consortium, in exploring dark matter for GPCRs, NRs, ICs and kinases.

View original record on NIH RePORTER →