CAREER: Practical Privacy and Fairness for Data-Driven Applications
Carnegie Mellon University, Pittsburgh PA
Investigators
Abstract
Data-driven applications play an increasing role in peoples’ lives, underpinning systems and services that collect broad personal information to provide novel functionality and valuable insights. Machine learning techniques are predominantly used to implement these applications, and developers have published an array of libraries that make it easy for any programmer to benefit from this technology. While excitement over these developments has led to numerous positive contributions, it has also been accompanied by concerns around the privacy of individuals’ data, and the potential for these systems to discriminate against some individuals. This project aims to move ahead of these problems by exploring verification techniques for uncovering instances of protected information use that lead to privacy loss and discrimination. Inspired by recent advances that allow attribution of predictions in machine learning models, we build on methods from software model checking and optimization to locate components pivotal to these outcomes, and construct data representations that aid in removing them. In parallel, we are developing a deeper understanding of new types of software "bugs" that result in such harms: bias amplification, which imperils fairness, and exploitable data memorization, which introduces privacy risk. We aim to quantify the extent to which existing techniques can prevent the occurrence of these bugs, and inform the development of new ones that are specifically targeted at them. As this project progresses, we are applying the results towards educating a diverse workforce on data privacy, algorithmic fairness, and rigorous approaches to constructing software that uses machine learning effectively. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →