Bayesian machine learning for complex missing data and causal inference with a focus on cardiovascular and obesity studies
University Of Florida, Gainesville FL
Investigators
Linked publications & trials
Abstract
Project Summary This proposal will develop Bayesian machine learning approaches via Bayesian nonparametrics (BNP) to handle nonignorable missingness (in outcomes and covariates) and conduct causal inference for electronic health records (EHRs), to address missingness in multivariate longitudinal data, and for causal mediation problems. Missing data remains a problem in clinical studies and in particular, for studies using EHRs. In clinical studies, more eâµort is spent to try to minimize the amount of missingness, but it still remains a problem and missingness is a constant issue (and less controllable) in studies based on EHRs. In addition, there has been limited work on the use of auxiliary information in EHRs that can enhance the ability to deal with missing data. Approaches for missingness in multivariate longitudinal data is underdeveloped and relevant across many clinical trials settings from cost eâµectiveness analysis to incomplete time-varying auxiliary covariates (or confounders) to causal mediation to multiple outcomes of interest. The mechanisms of treatment eâµectiveness are of particular interest in behavioral trials. Speciï¬cally, how do diâµerent processes mediate the eâµect of an intervention? This can facilitate constructing future interventions. However, determining the causal eâµect of such 'mediators' on outcomes is di"cult. We will develop new approaches to identify these eâµects in the complex setting of cluster randomized trials for which little work has been done. For all these settings, a Bayesian approach is ideal as it allows one to appropriately characterize uncertainty about unveriï¬able assumptions (which are present in all these problems) and allows the ï¬exibility of Bayesian nonparametric models. MCMC algorithms for BNP can sometimes converge slowly and can be untenable for large n. We will extend existing approaches to address both these complications which will be important for all the applications considered and in general, given the increasing size and complexity of data. The methods are motivated by several NHLBI funded studies, whose PI's are co-investigators on this proposal, and will be developed to help answer numerous important clinical questions including the mechanisms of behavior change in weight management and the impact of linkage (and engagement) to care on treatment eâµectiveness for blood pressure outcomes. The methods will also help us evaluate potentially synergistic eâµects when drugs with potential diabetogenic eâµects are used concomitantly and whether the impact on cancer outcomes varies by diâµerent bariatric surgeries. The history of the the collaborations among the entire study team will help produce the best science and facilitate dissemination of our methodological and clinical ï¬ndings. We will disseminate code for these methods (via the PI's github page and software papers) to ensure the methods will be readily usable by investigators involved in cardiovascular, obesity, diabetes, and cancer studies.
View original record on NIH RePORTER →