CIF: Small: Privacy and Utility of Databases: An Information-Theoretic Approach
Princeton University, Princeton NJ
Investigators
Abstract
Information technology and electronic communications have been rapidly applied to every sphere of human activity, including commerce, medicine and social networking. The concomitant emergence of myriad large centralized searchable data repositories has made "leakage" of private information via data correlation (inadvertently or by malicious design) an important and urgent societal problem. Maintaining the usefulness of these data sources while also providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an overarching analytic framework that can tell us unequivocally how safe private data can be (privacy) while still providing useful benefit (utility) to multiple legitimate information consumers. This research develops a unified framework to study the utility-privacy tradeoff irrespective of the type of data source or method of perturbation. Techniques and results from rate-distortion theory are used to model data sources, develop application independent utility and privacy metrics, and develop a side-information model for dealing with questions of external knowledge. The framework, applicable for single query data source models, is extended to study the utility-privacy tradeoffs for multiple-query models. Also studied is a successive disclosure problem which draws on classic results in successive refinement to develop the conditions under which multiple queries result in no additional information loss. The universal framework developed includes tools and techniques to bridge the gap between the information-theoretic model and current approaches and the dominant theoretical framework in computer science.
View original record on NSF Award Search →