CAREER: Human-Centered Machine Learning: Robustness, Fairness and Dynamics

$526,935FY2022CSENSF

University Of California-Santa Cruz, Santa Cruz CA

Investigators

Abstract

Machine learning (ML) is increasingly used in domains that have a profound effect on people's opportunities and well-being, including healthcare, law enforcement, and consumer finance. The emphasis on the role of humans in developing a machine learning system raises significant challenges. Several arise because the data a machine learning model are trained on is, in many contexts, generated from repeated human-ML interactions over time. For example, in a loan-application decision system-support system, after an initial decline a user may change their financial behavior to obtain a better result in future applications. Human behavior changes in response to the ML decisions change the distribution of data that the ML model is exposed to in training. However the usual paradigms for training and use of ML systems often assumes static data distributions. This requires us to revisit existing machine learning tools and our understanding of their established robustness and fairness properties. This project’s focus on robustness, fairness, and human-ML interaction dynamics will alert machine learning practitioners of the irreparable harm that may be caused by blindly trusting existing training data. The project and its future extensions will provide frameworks and tools to build healthy machine learning development and deployment approaches that will much better serve humans in their long-term well-being. This project aims to provide fundamental understandings and algorithmic solutions to improving model robustness and fairness in a human-centered machine learning system. The project builds theoretical and computational frameworks to understand the changes in data induced by the deployment of machine learning models. Central to our intellectual inquiry is that in human-centered systems, humans are "responding" agents who will react to the algorithmic decisions or the interactive environments (e.g., a data collection system) they are subject to. These reactions often cause data distribution to shift between training and deployment, which substantially challenges the commonly made assumption that the training data represents the test ones. The results will help advance the state-of-the-art by providing a theoretically sound and computationally efficient framework for designing robust and fair machine learning solutions that consider distribution shifts triggered by human responses to the algorithms in real-world deployment. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →