HashtagHealth
National Institute Of Nursing Research
Investigators
Abstract
We summarize activities on our research projects. 1) Designing and implementing an AI-powered chatbot for health information Rosie is a multilingual chatbot built on a three-stage AI pipeline: retriever, re-ranker, and generative models. For English queries, Rosie uses Contriever to retrieve information, TART to re-rank the top 10 passages, and Llama3.1 to generate responses. The Spanish version employs mContriever, Multilingual-E5-large, and Llama 3.1, respectively. To assess Rosieâs effectiveness, we conducted a randomized controlled trial with 400 pregnant or postpartum women (14+ years old, infants <6 months) from 49 U.S. states. Analysis of baseline characteristics found no statistically significant differences between treatment and control participants. For participants who reached the 6-month mark (n=390), we sent out mid-point surveys with a high response rate of 97%. Additionally, 303 participants reached the 12-month mark and have been sent the endpoint survey with a response rate of 91%. Preliminary analyses of the midpoint survey results suggest a decrease in mothersâ emergency room visits compared to baseline. Evidence from midpoint data suggests Rosie users indicated high confidence in the following: quality responses, easy-to-understand answers, usability, and satisfaction. Midpoint data also suggests sustained engagement over time in app usage, with expected slight disengagement over time comparable with application usage trends. 2. Advances in Computer Vision Neighborhood Built Environment Characterization Audits of built environment features have heretofore been conducted using costly and time-consuming onsite visits or manual annotation of street view imagery. The dearth of systematically collected and harmonized data on contextual factors limits research on their impact. Developing data algorithms that can analyze street imagery will dramatically reduce cost and increase the availability of neighborhood data. To this end, we will develop data mining and data management techniques to systematically sample street imagery and advance computer vision models to create neighborhood built environment indicators relevant to health outcomes. We have piloted the construction of several such indicators using 165 million Google Street View images from the United States and have achieved accuracy rates above 85% (sidewalks, crosswalks, street trees, mixed land use, streetlights, number of lanes). Using this data repository, we have conducted several significant studies. We found that neighborhood characteristics can account for 75-82% of the variation in mental, physical, and general health. In another study, the presence of sidewalks was significantly associated with physical activity, outperforming other walkability indicators. We also mapped GSV-derived built environment indicators onto neighborhood deprivation indices. In urban areas, visible utility wires and chain link fences were associated with greater neighborhood deprivation scores, while street greenery was related to decreases in neighborhood deprivation. In rural areas, single-lane roads were associated with increases in neighborhood deprivation, while non-single-family homes (an indicator of mixed land use) were associated with decreases in deprivation. Furthermore, examining changes in Washington, DC, from 2014 to 2019, we observed a shift towards higher-density housing and increases in single-lane roads, suggesting compact urbanization. Longitudinal analyses revealed that increased construction activity was associated with lower rates of obesity, diabetes, high cholesterol, and cancer. We are extending this computer vision work to produce built-environment predictors of motor vehicle collisions. Each year, approximately 1.35 million people die in road traffic collisions globally, and road collision injuries are the top cause of death for young people aged 5-29. To provide much-needed data and empirical findings, we will: 1) Develop computer vision techniques to produce vehicle collision risk indicators, 2) Measure the accuracy of data algorithms and construct an interactive geoportal, 3) Utilize our data repository, Collison Vision, and a large collection of injury and fatality records to evaluate built environment impacts on motor vehicle collision risk. We have experimented with deep learning models, such as DETR (Detection TRansformer) and Vision Transformers (ViT) models, and we arrived at high accuracies for traffic infrastructure and neighborhood characteristics. Utilizing vehicle collision data from the National Fatality Analysis Reporting System (FARS), we found that sidewalks, streetlights, street greenness, and single-lane roads were associated with marked reductions in fatal vehicle collisions. Ongoing modeling work aims to leverage more advanced foundation models. While ImageNet models perform well, we suspect a domain gap exists between these images and our GSV dataset. To address this, we employed the unsupervised DINO method to refine these models further. We trained ViT-S and VMamba-S on 1 million GSV images, which showed improvements on neighborhood characterization tasks. The ViT-B model, pretrained on ImageNet22k with DINO, showed excellent performance on streetlight detection but struggled with other indicators, confirming the domain gap. Additionally, human annotationâwhether through crowdsourcing or research staffâis both time-consuming and costly. To minimize human annotation costs and predict vehicle collision risk indicators for both common objects (e.g., streetlights, cars, stop signs, traffic signs) and rarer objects (e.g., speed bumps, roundabouts, etc.), we investigated several foundational detection models capable of doing open-vocabulary detection. This capability allows us to manually adjust prompts to detect all indicators of interest simultaneously, in contrast to the traditional detection pipeline where categories must be predicted one by one. Specifically, we selected two recent models for further experimentation and fine-tuning: GroundingDINO (2023) and GroundedSAM (2024). 3. Natural language processing of large unstructured text data for health outcomes research Social media platforms offer a relatively untapped and extensive source of public discussion from which health professionals can develop targeted interventions. Analyzing over 60,000 Facebook posts, we identified adverse event mentions in social media posts referencing GLP-1 RA medications. We found that gastrointestinal-related symptoms, headaches, joint pain, hypertension, and pancreatitis were among the most mentioned adverse events. Additionally, we are conducting a study to use natural language processing algorithms to extract social determinants of health (SDOH) from clinical notes. The WHO defines SDOH as âthe conditions in which people are born, grow up, live, work, and age,â which can impact up to 80â90% of a patient's health and clinical outcomes. Despite their significance, SDOH are often not systematically assessed in clinical settings and are mainly found in unstructured notes or electronic health records (EHRs). Our analysis will compare encoder-decoder and decoder models to extract SDOH from clinical notes in a sepsis patient cohort from the MIMIC-IV database. We will examine associations between SDOH and 30-day mortality following suspected sepsis in ICU patients. Moreover, we are designing a study to utilize longitudinal records of clinical notes to develop an algorithm for earlier detection of Spondyloarthritis (SpA). (SpA), a chronic inflammatory disease affecting the spine and joints. Common features include inflammatory back pain, stiffness, and dactylitis, often leading to misdiagnosis and delays in treatment. We hypothesize that an algorithm can be developed from EHRs to identify undiagnosed SpA cases based on established guidelines. The project will involve data collection, algorithm development and validation. We aim to improve early diagnosis, allowing for timely treatment and better patient outcomes. Integrating the algorithm's risk score into EHR systems could alert healthcare providers to the need for further evaluation, ultimately reducing diagnostic delays and enhancing quality of life for patients.
View original record on NIH RePORTER →