Collaborative Research: SCH: Personal Determinants of Health Enhanced Machine Learning Models for Early Prediction of Alzheimer's Disease and Related Dementias
University Of Pennsylvania, Philadelphia PA
Investigators
Abstract
This project advances national health and promotes science and technology development by providing algorithms, software, and systems that can train machine learning models on electronic health records (EHRs) for accurate and early prediction of Alzheimer’s Disease and Related Dementias (ADRD). ADRD is a severe neurodegenerative disorder that effects over 5,000,000 people over the age of 65 that is characterized by progressive memory, cognitive impairment and personality changes, which can further evolve to dementia and death. Early prediction of ADRD is crucial for timely intervention and improved patient outcomes. Recent studies have shown that personal risk factors such as education, employment, and lifestyle or family history significantly influence ADRD onset and progression. However, these factors are not recorded in a structured format within the existing EHRs. In contrast, personal risk factors are often embedded within the free text of clinical notes or discharge summaries that are not easily searchable, computable, or standardized. This creates a major technical barrier for their integration into the ADRD prediction models. To address this, this project develops a computational platform using novel machine learning and natural language processing to automatically extract personal risk factors from EHR clinical narratives and leverage them for accurate and early prediction of ADRD. This research significantly improves ADRD prediction accuracy and timeliness, with potential generalizations to other neurological disorders. By exploring the interaction between personal and clinical factors in disease development, this project pushes the boundaries of current knowledge in machine learning and ADRD research, potentially transforming approaches to early detection and management of complex neurological disorders. To achieve the goal of developing personal risk factor enhanced machine learning models for early ADRD prediction, this project develops four thrusts of novel approaches, each addressing key methodological challenges. First, the project develops a domain knowledge guided large language model to extract risk factors from EHR clinical narratives, which can adeptly cope with the complexities inherent in real world EHR clinical narratives, such as noise and incomplete data entries. Second, the project develops an interpretable method using neural additive models that automatically identifies the individual risk factor’s contribution to the early ADRD prediction. Building upon this interpretable result, in the third thrust, the project develops a survival-based ADRD prognosis model that can be used to estimate the likelihood of ADRD development at any given point in the future, capturing the dynamics of risk trajectory. This approach can enhance clinical decision-making by identifying high-risk individuals who may benefit from more intensive care or early intervention. Fourth, this project constructs a personalized knowledge graph that integrates personal and other clinical risk factors into a unified format for capturing the overall health status for everyone at risk of developing ADRD. Moreover, this project develops adaptive machine learning algorithms that can dynamically update this knowledge graph to incorporate the evolving risk factors. Together, these approaches converge to address the fundamental limitations of existing ADRD risk prediction models, such as inability to handle complex and unstructured data, insufficient interpretability, and high computational overhead. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →