A Complex Disease Genetics Knowledge Provider for Biomedical Data Translator
Broad Institute, Inc., Cambridge MA
Investigators
Abstract
A major goal of the Biomedical Data Translator Program is to facilitate disease classification based on molecular and cellular abnormalities. While many experimental approaches exist to interrogate molecular or cellular processes, few can discern which among a host of potential abnormalities are relevant to disease in the human system. Genetic variants associated with disease are unique in providing molecular alterations causally related to human disease risk. There are two types of genetic associations. Rare disease associations can (usually) be clearly linked to a gene and are well represented by catalogs such as ClinVar, OMIM, and Monarch. Complex disease associations are harder to interpret because they (a) are statistical rather than qualitative and (b) usually lie in noncoding genomic regions that cannot be immediately translated to molecular or cellular abnormalities. Many complementary resources to help in the biological translation of complex disease associations have recently emerged, broadly classifiable as either âfunctional genomicâ datasets (e.g. from epigenomic profiling or chromatin capture) or predictive bioinformatic methods (e.g. that integrate various genetic and functional genomic datasets to predict disease-susceptibility genes or pathways). These resources require expertise to curate and interpret, and there is as yet no knowledge source that integrates them to interpret complex disease associations. Furthermore, techniques for harmonizing heterogeneous functional genomic datasets with respect to one another are not yet established, most predictive bioinformatic methods specify complex data-processing pipelines that have not yet been scaled to run across many diseases, and there are few if any âgold standardsâ to evaluate the molecular or cellular abnormalities identified by these resources. The goal of our proposed project is to address these gaps within a complex disease genetics Knowledge Provider for Translator. We are experts in complex disease genetics and maintain the Knowledge Portal Network (KPN), a collection of open source web portals and Smart APIs that make integrated genetic and genomic datasets publicly accessible for >180 complex diseases. We have built the KPN by developing a protocol for working with disease experts to aggregate and curate high-confidence genetic datasets, building computational pipelines to harmonize these data and apply predictive bioinformatic methods upon them, and extracting relationships mined from these data into a Neo4J graph database. We propose to use the KPN as a foundation to implement a Translator Knowledge Provider of high-confidence complex disease associations and predicted disease-relevant molecular and cellular abnormalities. We will implement this Knowledge Provider by (a) expanding the data sources, data types, and bioinformatic methods integrated within the KPN; (b) developing new computational algorithms to improve the ability of genetic data to identify molecular and cellular abnormalities underlying complex disease; (c) maintaining REST services provisioning Translator with these resources; and (d) developing methodologies for evaluating the accuracy and internal consistency of these data, further curating them, and defining use cases of them within Translator. In so doing, we will enable Translator users to address questions such as: ⢠What genes are causally linked to complex disease [X], and with what confidence? ⢠What is the increase in risk for complex disease [X] when gene [Y] is perturbed? ⢠What pathways are enriched for associations with complex disease [X]? ⢠What tissues mediate the pathogenesis of complex disease [X]? ⢠What other diseases are genetically correlated with complex disease [X]? We participated in the Translator feasibility study and contributed important insights to the project vision including (a) a unifying architectural model of Translator (based on interviews with each Translator team) closely followed by OTA-19-009; (b) the concept of Translator as a tool to augment (rather than replace) human reasoning; and (c) the idea of a âTuring testâ to evaluate Translator capabilities. Our expertise in human genetics and hypothesis-driven science, but also computer science and computational biology, ideally positions us to collaborate with NIH staff and other awardees to help guide Translator data integration in a scientifically rigorous manner.
View original record on NIH RePORTER →