CRII: RI: New Methods for Learning to Personalize from Observational Data with Applications to Precision Medicine and Policymaking

$175,000FY2017CSENSF

Cornell University, Ithaca NY

Investigators

Abstract

Personalization has long been a central problem in machine learning with successful applications in news and product recommendation, where training personalized recommendation models is usually based on repeated cheap experiments. A question of growing importance is how to translate this success to emergent problems such as precision medicine, where personalization appears to be key. The project will develop new and powerful methods, backed up by solid theory, to address increasingly urgent problems in personalization and its applications to precision medicine and policymaking. Moreover, the project will itself investigate applications to precision medicine and policymaking with an aim of developing specific guidelines that can be followed by practitioners. More generally, the research will lead to progress at the intersection of machine learning and causality, which in turn will advance our understanding of decision making from large-scale data. In precision medicine, the methods developed as part of this research will lead to improved patient outcomes through statistically efficient learning of the best way to personalize based on demographic and genetic characteristics. The research also has impact on policymaking, where personalization can be used to target educational interventions and improve the success of programs aimed at reducing recidivism, which in turn will reduce rates of incarceration and corrections spending. Implementations of the new personalization methods will be distributed as free, open-source packages for R and Python. These packages will provide a complete toolset for any doctor, sociologist, and other scientist or practitioner to develop highly effective personalization models for their application based solely on observational data. The research effort includes training and advising graduate and undergraduate students, with an emphasis on engaging with groups under represented in the field. Research results will be disseminated in public fora, including diversity-focused venues that offer an added outreach opportunity. Medicine and related contexts have the property that experimentation can be prohibitively small-scale, costly, dangerous, and/or unethical, in comparison to passive data collection. Luckily, massive and ever expanding datasets are available, including hospitals' electronic medical records, with richer and richer data available from increased genotyping practices. However, such datasets are purely observational and non-experimental, where the isolated causal effect of a particular treatment is hidden by a myriad confounding factors and needs to be carefully mined out. Since, as it turns out, standard approaches to the problem based on predictive analyses fall short in this setting, this gives rise to urgently important methodological questions as to how to adapt the success of black-box machine learning to the prescriptive purpose of learning how to personalize treatments for maximal causal effect based on completely observational data. The purpose of this research project is to work toward advancing current machine learning methodology to step up to this emerging challenge by developing personalization theory, methods, and applications. Personalization is at the core of machine intelligence theory and applications. The problem of learning to personalize has been an exciting area of research over the last decade, with a strong focus on collaborative filtering and recommendation applications for web services. At the same time, among the machine learning community, there has been a tremendous growth of interest both in causal inference from observational data and in medical applications. Work on the research will result in advances in machine learning and causal inference and in stronger connections between machine learning, causal inference, personalization, and medicine.

View original record on NSF Award Search →