Machine Learning and Deep Learning Solutions Supplement: Matching Methods for Causal Inference with Complex Data
Duke University, Durham NC
Investigators
Linked publications & trials
Abstract
NARRATIVE SUMMARY The landscape of data formats is rapidly expanding, with image, text and other complex formats becoming available for health related outcomes. By considering such data within the context of observational causal inference, they can be leveraged to improve clinical decisions, help evaluate treatment efficacy by estimating individualized treatment effects and help develop intelligent therapeutic systems where individualized treatments can be deployed. In R01EB025021, we concentrate on understanding how nearly exact matching can be achieved in the presence of a large number of categorical covariates. The proposed approach (called FLAME - Fast Large Almost Matching Exactly) is able to quickly learn which categorical covariates are important and to produce high quality matches \citep{wang2017flame,dieng2018collapsing}. The main shortfall in the proposed work for R01EB025021 is that it does not naturally extend to more complex data types, it only works for categorical data in which each feature is meaningful. {\bf This proposal will develop new statistical and computational tools for causal analysis of complex data structures.} Our new approach is called {\emph Matching After Learning to Stretch (MALTS)}. For each unit (e.g. patient), we propose learn a latent representation of their covariate information and a distance metric on the latent space such that units that are matched tend to provide accurate estimates of treatment effect. MALTS can use deep learning to encode the latent representations for the units, or it can learn basis transformations in linear space (stretching and rotation matrices) for simpler continuous data types. We will develop the MALTS algorithm, and apply it in a medical context. Our goal is to construct high quality matches for the following types of data: (i) medical images, such as x-rays and CT scans, (ii) medical record data, (iii) time series data (continuous EEG data), (iv) a combination of any of the first three types of data. We aim to leverage the newly developed tools to continue our evaluation of the efficacy of isolation for flu-like ailments as well as to apply them more broadly to publicly available modern datasets such as the MIMIC III database.
View original record on NIH RePORTER →