Bayesian machine learning for complex missing data and causal inference with a focus on cardiovascular and obesity studies
University Of Florida, Gainesville FL
Investigators
Linked publications & trials
Abstract
Project Summary This proposal will develop Bayesian machine learning approaches via Bayesian nonparametrics (BNP) to handle nonignorable missingness (in outcomes and covariates) and conduct causal inference for electronic health records (EHRs), to address missingness in multivariate longitudinal data, and for causal mediation problems. Missing data remains a problem in clinical studies and in particular, for studies using EHRs. In clinical studies, more effort is spent to try to minimize the amount of missingness, but it still remains a problem and missingness is a constant issue (and less controllable) in studies based on EHRs. In addition, there has been limited work on the use of auxiliary information in EHRs that can enhance the ability to deal with missing data. Approaches for missingness in multivariate longitudinal data is underdeveloped and relevant across many clinical trials settings from cost effectiveness analysis to incomplete time-varying auxiliary covariates (or confounders) to causal mediation to multiple outcomes of interest. The mechanisms of treatment effectiveness are of particular interest in behavioral trials. Specifically, how do different processes mediate the effect of an intervention? This can facilitate constructing future interventions. However, determining the causal effect of such 'mediators' on outcomes is difficult. We will develop new approaches to identify these effects in the complex setting of cluster randomized trials for which little work has been done. For all these settings, a Bayesian approach is ideal as it allows one to appropriately characterize uncertainty about unverifiable assumptions (which are present in all these problems) and allows the flexibility of Bayesian nonparametric models. MCMC algorithms for BNP can sometimes converge slowly and can be untenable for large n. We will extend existing approaches to address both these complications which will be important for all the applications considered and in general, given the increasing size and complexity of data. The methods are motivated by several NHLBI funded studies, whose PJ's are co-investigators on this proposal, and will be developed to help answer numerous important clinical questions including the mechanisms of behavior change in weight management and the impact of linkage ( and engagement) to care on treatment effectiveness for blood pressure outcomes. The methods will also help us evaluate potentially synergistic effects when drugs with potential diabetogenic effects are used concomitantly and whether the impact on cancer outcomes varies by different bariatric surgeries. The history of the the collaborations among the entire study team will help produce the best science and facilitate dissemination of our methodological and clinical findings. We will disseminate code for these methods (via the PJ's github page and software papers) to ensure the methods will be readily usable by investigators involved in cardiovascular, obesity, diabetes, and cancer studies.
View original record on NIH RePORTER →