SaTC: CORE: Small: Auditing Private Statistical and Machine Learning Algorithms: Theory and Practice

$600,000FY2023CSENSF

Northeastern University, Boston MA

Investigators

Abstract

The use of statistics and machine learning to analyze sensitive data poses serious privacy risks to users who contribute their data to these algorithms, including: data reconstruction (revealing large portions of the training data), membership inference (revealing the presence of specific individuals in the training data), and data memorization (revealing specific training examples). Differential privacy has now become the standard for protecting data privacy in machine learning and statistics, as it offers a strong, formal, and quantitative guarantee of individual privacy. However, for many deployments of differential privacy there remains a significant gap between the formal guarantees and expectations in practice, and this project aims to bridge this gap by auditing deployed algorithms to discover and explain their privacy properties. The project’s novelties are building the theoretical foundations of privacy auditing and designing empirical methods that measure the privacy leakage of private algorithms in real-world scenarios. The project’s broader significance and importance will be in specific recommendations to practitioners on choice and use of private algorithms, and an understanding of the potential privacy violations for specific machine learning applications. The project team has expertise in differential privacy, machine learning, and cybersecurity, and plans a set of education tasks and outreach activities: public course materials on trustworthy machine learning and privacy, mentoring undergraduate and graduate students in research projects, and collaboration with industry partners. This project includes three interconnected thrusts addressing different aspects of privacy auditing. The first thrust lays the theoretical foundation of privacy auditing by developing optimal membership inference attacks with stronger guarantees than previous work. The second thrust introduces novel methods for auditing convex machine learning models and neural networks, by using insights from poisoning attacks developed in adversarial machine learning. The final thrust designs tools for auditing end-to-end privacy leakage of machine learning models trained under the continual-learning paradigm. These research thrusts enable a suite of techniques for auditing private algorithms, with the goal of providing guidance to practitioners on how to select private algorithms and their parameters to balance the utility and privacy guarantees on tasks of interest. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →