GGrantIndex
← Search

Semiparametric Techniques for Data Exploitation across Heterogeneous Populations

$224,999FY2023MPSNSF

University Of Wisconsin-Madison, Madison WI

Investigators

Abstract

In various fields, from clinical medicine to policy research, researchers often have access to data and information from multiple populations that are relevant but different. For example, in biomedical studies that use electronic health records, relying solely on labeled data for analysis may be inefficient due to small sample sizes resulting from various resource constraints. In a clinical trial setting, physicians may need to interpret evidence from a randomized controlled trial consisting of patients whose demographics and other historical characteristics are quite different from their own patients. Similarly, researchers studying a pneumonia outbreak during the flu season may find a predictive model developed during the non-flu season to be relevant and useful. In all of these scenarios, it is crucial to develop methods that can appropriately incorporate information from one population into statistical analyses for another. This project will develop a suite of statistically sound methods that can effectively integrate external data into primary studies. The research product has the potential to be applied to various fields, such as Alzheimer's disease, mental health disorders, cancer, and pain research. The project also contains active mentoring plans at both disciplinary and interdisciplinary levels, benefiting local high school students, undergraduates, master's and PhD students, as well as biomedical investigators. In this project, the unique combination of semiparametric statistics, robust statistical methods, statistical learning techniques, missing data analysis, and high-dimensional data analysis will be leveraged to develop a suite of statistically sound methods for incorporating external data into primary studies. The new methods have minimal model assumptions: they are either developed under an assumption lean framework or allow for misspecification of more than one nuisance model in the procedure. Compared to naive methods that do not incorporate external data, the new methods are guaranteed to increase estimation efficiency, improve statistical power, and enhance scientific discovery. Moreover, they achieve maximum efficiency gains when the nuisance models are correctly specified. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →