Using Natural Language Processing and Speech Processing Techniques to Understand and Predict Cognitive Decline
State University New York Stony Brook, Stony Brook NY
Investigators
Abstract
Abstract Alzheimerâs disease and related dementias are a significant public health concern, especially for disaster responders and construction workers exposed to neurotoxic hazards. Although valuable data about occupational exposure and cognitive health can be obtained from open-ended questions, analyzing such unstructured text and audio data is often time-consuming and underutilized. Recent advancements in Natural Language Processing (NLP) and Speech Processing (SP) offer new opportunities to efficiently extract this information and predict cognitive outcomes. This proposal aims to leverage NLP and SP tools to analyze unstructured free text and audio data from the World Trade Center (WTC) responder cohort, a population at high risk for cognitive decline due to disaster- related neurotoxic exposures. I developed and validated a linguistic tool (NLP method) to extract occupational exposure variablesâtermed "WTC Exposure to Response Activities (WERA)"âfrom free text descriptions of work activities. In the F99 phase, these WERA variables will be used to predict mild cognitive impairment (MCI) incidence, cognitive trajectory, and neurodegenerative biomarker distributions, with mask usage as a potential mediator. This NLP method will advance research in occupational cognition health by reducing reliance on structured lists and manual categorization of occupational activity exposures. In the K00 phase, I will expand the research by utilizing advanced NLP and SP techniques to analyze interview transcripts and audio recordings. By extracting both linguistic and acoustic featuresâsuch as word-finding difficulties, reduced vocabulary, pitch, pauses, and vocal qualityâthrough pre-trained models like RoBERTa and Wav2Vec, I aim to predict future cognitive changes in domain-specific cognitive functions. These features will be processed using machine learning models (e.g., random forest regressor, support vector regression, and neural networks) to predict cognitive decline over time. This approach offers a non-invasive, scalable, and cost-effective method for early detection of cognitive impairment, potentially benefiting other at-risk populations, including veterans and older adults exposed to neurotoxic hazards.
View original record on NIH RePORTER →