Bayesian machine learning for causal inference with incomplete longitudinal covariates and censored survival outcomes
Rbhs-School Of Public Health, Piscataway NJ
Investigators
Linked publications & trials
Abstract
Project Summary Population cohort studies funded by the National Institute of Health, including the Atherosclerosis Risk in Com- munities (ARIC) Study and Multi-Ethnic Study of Atherosclerosis (MESA), are widely used in cardiovascular research and have provided fundamental knowledge for cardiovascular disease (CVD) prevention strategies and public health policies. Pooling data across multiple cohorts provides a unique opportunity for in-depth investiga- tions of emerging CVD research questions, such as optimal blood pressure threshold values triggering initiation of antihypertensive treatment for young adults, that heretofore would not have been possible. While forming a fertile ground for innovative research, the methodological issues associated with the pooled cohorts data cannot be as effectively addressed by existing statistical methods. There are three main analytic challenges. First, many discrete or continuous longitudinal variables have missing values with various missing data patterns. Existing methods either are susceptible to misspeciï¬cation biases or do not provide coherent estimates of imputation un- certainty, and cannot handle missing not at random. Second, current causal inference methods either require aligned measurement time points or parametric assumptions about forms of causal pathways, neither of which can be satisï¬ed in complex longitudinal health data. Third, violations of the âsequential ignorabilityâ assumption embedded in causal inference methodology can be a potential source of bias. The sensitivity analysis methods for time-varying confounding with censored survival outcomes are underdeveloped. To overcome these chal- lenges and improve statistical and CVD research, we propose a suite of generalizable statistical methods utilizing machine learning. We propose to develop a scalable Bayesian nonparametric (BNP) framework to impute con- tinuous or discrete missing at random longitudinal covariates while providing coherent uncertainty intervals, and address the missing not at random mechanism via sensitivity analysis. We will apply the developed method to address missing data issues for several longitudinal CVD risk factors such as blood pressure, cholesterol levels (Speciï¬c Aim 1); to develop a robust and computationally efï¬cient BNP causal inference method (Speciï¬c Aim 2) and a new continuous-time marginal structural survival model from a Bayesian perspective (Speciï¬c Aim 3) to study and validate the survival effects of time-varying antihypertensive treatments for young adults and the frail elderly; to develop a ï¬exible and interpretable survival sensitivity analysis method to assess the sensitivity of the causal effect estimates to varying degrees of sequential unmeasured confounding (Speciï¬c Aim 4); and to create usable R software packages for all proposed methods and develop tutorial papers and short courses to bridge theoretical and practical knowledge and promote use of our methods (Speciï¬c Aim 5).
View original record on NIH RePORTER →