Data Coordination and Analysis Center

$392,003U19FY2016NSNIH

University Of California, San Francisco, San Francisco CA

Investigators

Pierre-Antoine Gourraudcontact Marcelo Fernandez-Vina

Linked publications & trials

Abstract

The Data Coordination and Analysis Center (DCC) will collate, QC, and format for analysis the immunogenetic results generated by Projects 1 and 2 and the clinical/demographic information provided by the Disease Groups. A highly effective centralized data coordination effort in an advanced and secure environment is crucial to the success of a study of this scale, including diligent internal and external communications and multi-channel dissemination the data. The DCC will be located at UCSF and, under the supervision of Dr. Gourraud, will serve as the repository of the Project phenotypic and laboratory informatics systems. The DCC will be electronically linked to the Project BioRepository and centralize communications with BISC. The vast amount of experimental data to be collected requires the development of a reliable informatics platform and a systematic approach for the establishment of data integrity, completeness, and accessibility. We will use a combination of relational and object-oriented databases with file servers, building on the previous experience of the leadership group with multi-dimensional, metacentric, and high data-consumption projects. This cloud-based data and knowledge infrastructure is accessed by an API serving as a data and computation gateway, which centralizes data import and export, transmits computation requests, and monitors data usage while controlling user-defined level credentials. In addition, the DCC team will lead the analysis for 3 types of specific HLA and KIR studies: disease association, disease endophenotype association, and across-disease / across-ancestry analyses. This multi-dimensional strategy will allow the generation of HLA and KIR specific and reproducible analytical variables and the application of consistent statistical models for all INDIGO datasets. Data access via the API will make possible the automated integration of INDIGO data resources with existing Immunogenetic data-sharing initiatives. Using the community-based IDAWG model for promoting data standards, the DCC will contribute to address the emerging challenges posed by HLA/KIR NGS. Finally, the DCC will make available tools for generating aggregated statistics using web-tools connected to the open end of the INDIGO API, and, when possible, enabling computation with third-party datasets using open source software and scripts produced by the program co-investigators.

View original record on NIH RePORTER →