CIF: Medium: Collaborative Research: Information-theoretic Guarantees on Privacy in the Age of Learning

$383,000FY2019CSENSF

Harvard University, Cambridge MA

Investigators

Abstract

Armed with powerful advances in machine learning, the ability of an interested party to gather personal information from an individual's expanding digital footprint is outstripping anyone's capability to keep their information private. While this aggregated data can have tremendous benefit for consumers and data scientists via technologies built on machine learning and artificial intelligence, this benefit must be tempered with meaningful assurances of privacy for the very people who provided the data in the first place. This project adopts a rigorous information-theoretic approach to give meaningful privacy guarantees while still providing statistical utility. By combining theoretical and data-driven research, this project can inform public policy as well as best-practices for industry. The overall goal is to provide any data scientist with a set of tools to guarantee meaningful privacy in practice. To do so, this project explores meaningful measures of privacy leakage in the learning context, characterizes the fundamental tradeoffs between privacy and utility, develops techniques to ensure privacy in realistic settings, and tests these algorithms on publicly available datasets. The project is also committed to broadening participation in computing via two outreach efforts: (i) interactive demonstrations of privacy issues that stem from using social media to middle and high school students via ASU's annual STEM event, Open Door, and (ii) teaching modules on machine learning (ML) and artificial intelligence (AI), and short courses ("data jams") at ASU via the Young Engineers Shape the World (YESW) summer program and at Harvard; these modules, targeted at female, financially disadvantaged, and Latino and Hispanic students, aim to make a meaningful contribution to increasing a diverse STEM workforce by providing students hands-on experience on basic concepts of coding, manipulating datasets, and producing simple visualizations collectively. Outreach efforts will be evaluated using well understood metrics for assessment of student interest, engagement, and knowledge via ASU?s College Research and Evaluation Services Team (CREST). This project aims to derive a foundational, statistical theory of privacy that builds upon and contributes to modern theoretical advances in information theory and machine learning. The statistical nature of inference (both for legitimate and illegitimate ends) requires a statistical approach to measuring and ensuring privacy and utility. A significant novel element derived from this view is the maximal alpha leakage, a new, tunable measure for information leakage which quantifies the ability of an adversary to learn any function of private data via a parametric class of loss functions. This tunable measure is derived from a rich information-theoretic framework based on Renyi divergence, thereby uniting disparate existing measures under a single framework. Moreover, its operational significance and computational flexibility allow for natural application in machine learning. In the context of these measures, this project studies privacy-utility tradeoffs both theoretically and in a data-driven manner in two distinct settings: (i) releasing datasets in a similar form as the original, with privacy and strict utility guarantees for arbitrary statistical analysis, and (ii) releasing privacy-guaranteed data representations for specific learning tasks. Broader dissemination of the work will go beyond conferences to organizing a privacy workshop in the latter half of the project to enable inter-disciplinary interactions and application. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →