GGrantIndex
← Search

Advancing Record Linkage Research: Optimal Linkage Decisions and Propagating Linkage Uncertainty

$150,000FY2019SBENSF

University Of Washington, Seattle WA

Investigators

Abstract

This project will advance research on record linkage. It is increasingly common to find complementary information on individuals scattered across multiple data sources. To take full advantage of these data sources, researchers need to be able to link information on the same individuals. In many applications, however, there are no unique identifiers of the individuals in the datafiles. This makes it difficult to recognize which records correspond to the same individuals. Statistical methodology will be developed for creating merged datafiles and for improving analyses of the linked data. These data linkages will allow richer data analyses and potentially substitute for or facilitate new data collection efforts. Researchers across disciplines will benefit from being able to use statistically rigorous procedures to merge datasets and to carry out analyses with linked data. The ability to create and analyze richer datasets will facilitate understanding of policy options in important areas such as education and health, thus furthering societal interests. A graduate student will be trained as part of this project, and the techniques will be made available as part of free software packages along with tutorials. This project will use the output of probabilistic record linkage procedures to develop rigorous statistical methodology for creating merged datafiles. Coherent approaches for propagating linkage uncertainty into subsequent analyses will be explored. To create merged datasets, the investigator will derive an estimator of the true linkage of the datafiles. A loss function will be developed through which researchers will be able to give different weights to different types of linkage errors. The linkage estimator will be derived by minimizing the expected value of the researcher-defined loss function. The point estimators also will include the option of "abstaining" from linking records for which the correct links are highly uncertain. To perform statistical analyses with merged data, the investigator will explore procedures in which researchers carry out the statistical analysis they are interested in for each of several plausible linkages of the data and then combine the output from these analyses. The procedures will be validated theoretically, via simulation studies, and using real data analyses. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →