Dependable Predictive Inference with Uncertainty-Aware Machine Learning

$160,000FY2022MPSNSF

University Of Southern California, Los Angeles CA

Investigators

Abstract

Complex statistical and machine learning models, including deep neural networks, are widely applied in many fields and they are becoming increasingly central to data-driven science, despite serious concerns about their reliability. These models cannot always be trusted, especially in sensitive and high-noise applications such as those found in genomics, as well as in all of those contexts in which machine learning predictions will affect people’s health or welfare. A crucial current limitation of machine learning models is that they may not adequately capture uncertainty and their predictions often tend to be overconfident. Further, machine learning models are known to sometimes reinforce latent biases hidden in the data, and thus they may lead to predictions that are systematically biased against certain groups of individuals. Finally, many statistical and machine learning models may perform well within the specific data set in which they are trained, but their predictions are not robust to changing data environments, such as those corresponding to the genetic analysis of individuals from populations with different ancestries. To address the above limitations, this research project will develop general methods for accurate, fair, and robust uncertainty estimation in machine learning. In the specific contexts of genomics, this work will lead to improved genetic risk prediction across human populations, facilitating further developments in personalized medicine, bridging health disparities across populations, and helping deepen our scientific knowledge of heritable diseases. This project will support education in statistical and machine learning research by providing training opportunities for graduate students. This project will also help promote diversity in statistical and machine learning research by helping support the investigator’s involvement with the Diversity, Inclusion, Access JumpStart initiative of the University of Southern California. In particular, the investigator will offer summer research opportunities focusing for undergraduate students on the topics of this project. This research consists of three distinct but closely connected parts. The first part will develop novel conformal inference methods to train and calibrate uncertainty-aware machine learning models that are both accurate and reliable. This research will involve the development of novel loss functions and innovative stochastic optimization algorithms. The second part of this project will develop methods for training and calibrating uncertainty-aware machine learning models that treat individuals belonging to different groups fairly, carefully using hold-out observations to correct for possible algorithmic or data biases. The third part of this project will develop methods based on data holdout and conformal inference to construct predictive models that are more robust to possible shifts in the covariate distribution. These models will be able to leverage possible interactions among the available predictive variables and ultimately lead to powerful multivariate models of genetic risk for heritable diseases that may be relied on across different populations. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →