GGrantIndex
← Search

Developing Statistical Tools for Data integration and Data Fusion for Finite Population Inference

$375,000FY2023SBENSF

Iowa State University, Ames IA

Investigators

Abstract

This research project will develop statistical learning tools for data integration and data fusion. Given the proliferation of new data sources, researchers are increasingly utilizing convenient but often uncontrolled big data sources, web survey panels, and administrative data. Data integration is an emerging field of study that combines multiple data sources in a reliable way. However, statistical tools for data integration are limited. The results of this research will significantly impact the analysis of complex survey data with big data, as well as scientific conclusions drawn from multiple data sets. The investigator plans to actively collaborate with researchers at different statistical agencies, and applications will be conducted to demonstrate the value of the new methods in various settings. The results of this research will be disseminated via publications, presentations, short courses, webinars, and software. Graduate students will be involved in the conduct of the research. This research project will produce statistical and machine learning tools for data integration and fusion. Statistical agencies face increasing pressure to utilize convenient but often uncontrolled sources of data. While such data sources provide timely data for a large number of variables and population elements, they often fail to represent the target population of interest because of inherent selection biases. By using an independent probability sample as a calibration sample, the selection bias in the convenience sample can be reduced; however, the statistical tools for data integration are not yet satisfactory. In survey sampling research, statistical inference combining multiple data sources is a relatively understudied topic. This research will expand the scope of survey data analysis by providing numerous statistical and machine learning tools for data integration and by enhancing the use of data integration through example applications. The investigator will address important research topics such as mass imputation using modern machine learning tools, propensity score weighting using information projection, calibration weighting with high dimensional covariates, multiple bias calibration for data integration, and optimal estimation and sampling design for data fusion. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →