GGrantIndex
← Search

Development and implementation of statistical machine learning methods to shorten rare disease odysseys

$249,000R00FY2025LMNIH

Vanderbilt University Medical Center, Nashville TN

Investigators

Abstract

Patients with rare diseases (RDs) face tremendous physical, psychosocial, and economic suffering in their protracted journeys toward diagnosis and therapy. These journeys, known as diagnostic and therapeutic odysseys, are riddled with diagnostic delays and difficulties finding effective treatment strategies. The Undiagnosed Diseases Network (UDN) at the NlH was established to diagnose individuals who are living with the often-dire consequences of an RD. Despite UDN’s comprehensive diagnostic approach, 70% of patients remain undiagnosed, highlighting the need for novel diagnostic strategies. The diagnostic approach at the UDN currently relies on manual extraction of RD phenotypes from clinical notes in electronic health records (EHR), which is laborious and time-consuming. A promising alternative is to leverage natural language processing (NLP) models, which can automatically extract fine-grained RD phenotypes from clinical notes, to support timely diagnosis at the UDN. Existing general NLP models, however, are not suitable for supporting diagnosis at the UDN. Furthermore, NLP models have limited impact on diagnosis due to scarce infrastructure for delivering them to the clinic, highlighting the need to bridge the implementation gap between NLP and practice. Even after diagnosis, patients often undergo therapeutic odysseys. Despite advancements in gene therapy, evidence shows that genetics alone do not account for the wide diversity in RD phenotypes. Exposures also play a critical role, but less is known about how their causal effects vary across individuals. This knowledge gap underscores the need to elucidate the complex phenome-genome-exposome interplay on an individual-level basis, which is crucial in informing personalized disease management strategies. The overall objective of this proposal is to develop and implement advanced statistical machine learning (ML) methods aimed at shortening RD odysseys. Building on my K99 work, l will develop a novel NLP system to identify, standardize, and prioritize RD phenotypes for diagnosis (Aim 1) and implement it using REDCap at the Vanderbilt UDN to support diagnosis (Aim 2). l will leverage phenomic, genomic, and exposomic data from All of Us and build a causal inference framework that uses modern statistical ML techniques to estimate personalized causal effects of exposures on RD phenotypes (Aim 3). The expected outcomes are a novel, open-source NLP system for RD diagnosis, an implementation framework using REDCap to support timely diagnosis at the Vanderbilt UDN, and an advanced, reproducible causal inference framework to elucidate the complex phenome-genome-exposome interplay underlying RDs on an individual- level basis. This proposal aligns with the Pl's expertise in statistical ML, artificial intelligence (Al), and causal inference. Overall, this project can help the Pl launch her independent research career in developing advanced statistical ML and Al methods to shorten RD odysseys.

View original record on NIH RePORTER →