DARSaW: Developing, Assessing, and Refining Synthetic Sampling Weights to Improve Generalizability of the All of Us Research Program Data

$224,480R21FY2023MDNIH

Vanderbilt University Medical Center, Nashville TN

Investigators

Linked publications & trials

Abstract

Project Summary The All of Us Research Program (All of Us) is a large-scale initiative to collect and study multimodal data from over one million participants living in the United States (U.S.). Studies have shown significant disparities in disease prevalence compared to the broader U.S. population, potentially due to the overrepresentation of traditionally underrepresented groups. The challenge that limits the representativeness of All of Us to the target U.S. population is that the data are collected through a non-probabilistic sample design. This proposal aims to leverage two types of external data resources from the U.S. population to construct reliable Synthetic sampling Weights (SaW) for All of Us to mimic a probabilistic sample design and improve generalizability. The first external data resource, National Health and Nutrition Examination Survey (NHANES), creates a nationally representative dataset with validated sampling weights and individual-level data made publicly available. However, NHANESâ sample size is relatively small and can result in under-coverage. The second external data resource, the U.S. Census and the American Community Survey (ACS), are large-scale nationwide surveys that provide more but aggregated demographic and housing information about the U.S. population, compensating for the limitation of NHANES. However, individual-level data are not available. Utilizing the external data resources available in NHANES, the U.S. Census, and ACS, this project will develop, assess, and refine Synthetic sampling Weights (DARSaW) to improve the generalizability of All of Us to the target U.S. population. In Aim 1, we will develop the SaW for All of Us by leveraging the individual-level data from the NHANES and rich but aggregated summary statistics from the U.S. Census and the American Community Survey. In Aim 2, the effectiveness of the SaW will be assessed through case studies, comparing unweighted and SaW-weighted estimates of obesity, hypertension, and disability. We will iterate between Aims 1 and 2 to refine SaWs at the presence of discrepancy by post-calibrating to broader and deeper aggregated statistics from the target population. The goal of this proposal is to demonstrate the ability of the SaW to improve the generalizability of the All of Us data, enabling researchers to draw valid conclusions about the target U.S. population.

View original record on NIH RePORTER →