CAREER: Data Preparation for Trusted and Fair Data Science

$466,411FY2023CSENSF

Purdue University, West Lafayette IN

Investigators

Abstract

Machine learning is becoming the standard choice for data science applications that involve automated decision-making for a variety of application domains. Designed carefully, learning-enabled systems have the potential to eliminate some undesirable aspects of human decision-making. Unbalanced outcomes are harmful because they impede societal trust in machine learning. This project will develop novel technologies to realize the potential of robust and explainable data-driven decision-making systems. Toward this goal, the project centers on data preparation and debugging techniques to ensure that the underlying training data and data handling processes are devoid of unexpected errors. The project will demonstrate the importance of data quality in enabling trust in data-driven decision-making systems in practical domains. This project will advance understanding in the field of responsible data science, particularly on how data quality issues and data preparation steps impact the quality of downstream machine learning models and data science pipelines. The technical aims of this project are divided into three thrusts that are complemented by intermediate evaluation plans. The first thrust develops tools to detect the errors that cause lower quality outcomes in machine learning models and pipelines and suggests potential data fixes to mitigate those errors. The second thrust develops approaches to assess the validity or suitability of data for learning trustworthy machine learning models. The third thrust develops a framework to involve the different human roles and their expertise for data quality. Together, these techniques will enhance our understanding of how data quality and data preparation influence decision-making and will spotlight data as a tool for understanding and debugging undesired behavior of data science applications. Findings from this project will inform future research on designing more robust data science applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →