Fusion Pursuit for Pattern-Mixture Models with Application to Longitudinal Studies with Nonignorable Missing Data
University Of Pittsburgh, Pittsburgh PA
Investigators
Abstract
Missing data is ubiquitous in scientific research, challenging the accuracy of statistical analyses, the results of which will ultimately generate knowledge and guide policy or decision making. This project aims to develop a suite of new statistical tools to address the challenges in analyzing longitudinal studies with nonignorable missingness, such as informative dropout. The principal investigator will incorporate a machine learning approach termed fusion pursuit into the pattern-mixture modeling framework to achieve more efficient estimation and inference in longitudinal association analyses. The methods will find broad use in survey, medical, and policy research, and in other areas that involve longitudinal studies with a heavy presence of missing data. The project will also integrate research with the training of graduate students, developing trainees in the topics proposed through research involvement and teaching. The project will extend the estimation and inference capabilities of pattern-mixture models in analyzing longitudinal data that are subject to missing not at random. Formulated in the framework of generalized estimating equations, this research develops a post-stratification fusion pursuit strategy to overcome over-stratification by missing-data patterns, which is the bottleneck of pattern-mixture models in analyzing large-scale data sets. The project will first develop a regularization approach to simultaneously collapse redundant missing-data pattern strata and estimate parameters of interest. To ensure valid statistical inference, the project will then develop a post-fusion inference approach to derive valid and generalizable confidence regions. Finally, the project will demonstrate the developed approaches in three situations of real-world longitudinal studies, including missing visits, missing covariates, and distributed data. The research project is expected to broaden the use cases of pattern-mixture models in analyzing longitudinal studies with missing data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →