GGrantIndex
← Search

Developing an Integrated Rare Disease Bioinformatics Resource to Determine Phenotype to Genotype Correlations

$1,408,504ZIAFY2023TRNIH

National Center For Advancing Translational Sciences

Investigators

Abstract

The team manually curated two rare diseases, Creatine Transporter Deficiency (CTD) and Farber Disease (FD), previously worked on in the TDB preclinical development project portfolio. The test data sets were used to validate and train natural language processing (NLP)-based artificial intelligence (AI) algorithms to search PubMed for rare disease gene variant associations. During this period, the team released version 1.0 of the web application. Over 125 public data sources were identified and are continuously being integrated into the data platform. The nomenclature and ontologies for the rare diseases have been harmonized with the Genetic and Rare Diseases Information Center (GARD) database to allow for harmonized searches for gene, variant and disease of interest. The team also implemented analytical and visualization tools to allow detailed visuals and interactive exploration of 2D/3D protein structures and gene variants. Additionally, all published, manually curated literature on CTD/SLC6A8 and FD/ASAH1 gene data was fully integrated to provide high accuracy genotype-phenotype correlation data. The development of the literature AI feature was initiated with the goal of mining relevant rare disease details and making the information accessible to researchers, clinicians, and patient caregivers. The current user interface allows mining of abstracts of published, peer-reviewed literature and query of a gene and/or disease and their association. AI algorithms then retrieve publications associated with these inputs, and deeper exploration can reveal molecular details of genetic variation and target protein structure. Customized implementation of novel natural language processing models will help assess the fit of the models to rare disease research. Work also has been initiated to incorporate rare disease animal model modules and present areas for infusing equity in rare disease research. Efforts are ongoing to establishing a comprehensive standard operating procedure and methodology for accessing and incorporating genomic variant and clinical data from non-public databases and sources.

View original record on NIH RePORTER →