Collaborative Research: CIF: Medium: Fundamental Limits of Privacy-Enhancing Technologies

$225,441FY2023CSENSF

Harvard University, Cambridge MA

Investigators

Abstract

Balancing the preservation of individual privacy and the utility of aggregate data for societal benefit is crucial in the modern data-driven world. In fields such as healthcare, education, and resource allocation, the responsible use of personal data can bring transformative changes and fuel the development of privacy- and fairness-guaranteed machine learning and artificial intelligence algorithms. This project aims to improve privacy-enhancing technologies (PETs) that uphold individual privacy while allowing comprehensive data analysis. The research will result in new methods that optimize PETs for privacy while minimizing their hidden and apparent costs, such as distortion and bias. Moreover, this project will also develop new methods for generating synthetic yet realistic data with privacy safeguards. Ultimately, this research will result in PETs that are more private, accurate, and fair. In practice, these improvements can impact a range of machine learning applications in industry, healthcare, and government. The project also engages students through research internships and STEM events. The research is divided into four interconnected areas, each tackling a distinct aspect of PETs that ensure differential privacy (DP). The first area develops optimal privacy mechanisms, specifically for applications that require a large number of data processing steps, such as gradient descent-based training algorithms used in machine learning. The second area of focus is enhancing privacy accounting, aiming to derive accurate and computationally tractable methods that track DP guarantees using tools from information theory. The third area assesses the costs of privacy, scrutinizing not just the impact of DP on accuracy, but also fairness and arbitrariness in machine learning models trained with DP-ensuring algorithms. The final focus is on generating realistic synthetic data, which, while maintaining privacy, can be used for various statistical tasks. The project employs a diverse range of techniques from information theory, optimization, mathematical physics, and machine learning. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →