GGrantIndex
← Search

How Does Automated Record Linkage Affect Inferences about Population Health?

$232,500R21FY2017AGNIH

University Of Michigan At Ann Arbor, Ann Arbor MI

Investigators

Linked publications & trials

Abstract

ABSTRACT Our broad research objective is to create the Longitudinal Intergenerational Family Electronic Micro-dataset (LIFE-M) spanning the late 19th and 20th century United States. Using automated record linkage technology, the LIFE-M project combines millions of vital records to reconstruct how and why individuals' health has changed across time. This multi-generational, longitudinal micro-database aims to transform research on health and longevity, on childbearing and family structure, and on the long-run health effects of early-life circumstances and exposures. In creating LIFE-M, however, we have encountered serious deficits in knowledge about the performance of automated record linkage technology. The proposed project seeks to evaluate the performance of the most popular and cutting-edge automated linking techniques for the purposes of creating longitudinal health data. Our specific aims are to (1) produce systematic evidence regarding the performance of automated record linking algorithms in terms of match rates, representativeness of the linked sample, erroneous matches (type I errors), and systematic measurement error; (2) examine how phonetic name-cleaning methods affect quality metrics; and (3) examine how record quality metrics vary for different underrepresented subgroups (including women, racial/ethnic minorities, and immigrants) and to determine how linking methods affect representativeness and inferences. To achieve these aims, we have developed new partnerships with record linking experts allowing us to incorporate the most cutting-edge methods in record linking. We will also rely on new ?ground truth? generated by LIFE-M project's independent, double-blind human review process. This project will contribute significantly to existing knowledge about the use of automated linking methods for creating longitudinal and intergenerational health data. It will also increase knowledge about potential sources of bias in health studies. Both contributions should greatly enhance the quality of descriptive and causal inferences about population health and aging and disparities in these outcomes.

View original record on NIH RePORTER →