CIF: Small: Foundations of Decentralized Data Science: Optimizing Utility, Privacy and Communication Efficiency
Stanford University, Stanford CA
Investigators
Abstract
The deluge of data generated daily across mobile devices, sensors, and servers holds the promise of unprecedented inferential power to revolutionize numerous industries and scientific domains, from medicine, to engineering, to infrastructure. Traditional data science that pools this data to a single location is becoming increasingly unrealistic due to bandwidth limitations in communication networks. Legal, administrative, and ethical constraints in sharing proprietary, personal, or sensitive data pose further challenges on the path to realizing this promise. The project pursues the following query: Can one extract value from data generated across an entire network without having to collect and process it in a single location? The broad goal of this project is to harness the inferential power of distributed data without the systemic privacy risks and costs resulting from traditional data collection. The project pursues decentralized schemes for a wide range of canonical data science tasks that exchange narrowly scoped messages to complete the desired task. These messages are designed to preserve the privacy of the user data and its sensitive characteristics while minimizing the total communication cost. As a result, they provide optimal trade-offs between accuracy for the desired task, privacy for the user data, and communication efficiency. The schemes adapt to the structure of the underlying data and network and, when available, can leverage low intrinsic dimensionality of the data and multi-round interactions over the network. This project also develops information-theoretic performance benchmarks that delineate what is impossible under privacy and communication constraints and establish optimality of the proposed schemes under various criteria. As such, the project delivers a rigorous and comprehensive theoretical foundation for decentralized data science that allows many canonical tasks to be efficiently and privately implemented on distributed data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →