CIF: Small: Deep Stochasticity for Private Collaborative Deep Learning

$386,000FY2022CSENSF

Clemson University, Clemson SC

Investigators

Abstract

By now, Deep Learning is achieving unprecedented performance levels in many applications ranging from computer vision to natural language processing to drug design. Training models usually require large volumes of training data, said data being collected from multiple individuals/organizations to ensure heterogeneity since homogeneous data may lead to over-fitting. Training data often contain sensitive information, e.g., healthcare records, browsing history, or financial transactions, thereby posing privacy threats for the individuals from whom the data were collected. Although multi-machine collaborative learning, such as decentralized learning and federated learning, allegedly solves privacy concerns by never letting the raw training data leave the participating machines, recent studies have revealed a completely different picture: Not only can features of the training data be inferred from shared gradient/model updates, but even the raw data can be reversely inferred from these shared gradients. Moreover, adding noise to shared gradients, a de facto standard for achieving differential privacy, becomes effective only when the noise is sufficiently large, possibly leading to a degradation of the training accuracy. This project, instead, seeks to enable privacy protection for participating machines through judicious randomization that exploits the structure of collaborative learning algorithms and leverages their natural resiliency to error. The project will enrich the current curriculum by providing new modules on privacy-preserving decentralized learning for both undergraduate and graduate classes. Broadening Participation in Computing will be addressed through outreach activities involving minority students via Clemson PEER (Programs for Educational Enrichment and Retention) and WISE (Women in Science and Engineering). The project explores several different approaches to judiciously embed stochasticity at the algorithmic level, so-called deep stochasticity, in order to enable privacy protection in the collaborative learning process. The proposed approach exploits the natural resiliency of deep learning algorithms to parameter errors/noises, and enables privacy without compromising accuracy or incurring heavy computation/communication overheads with the flexibility to accommodate additional mechanisms like cryptography. The techniques are applicable to both parameter-server-free decentralized learning and to parameter-server facilitated federated learning. The main research thrusts center on the design of collaborative learning algorithms that use stochastic quantization schemes for inter-machine communications and random learning stepsizes in building the iterates. Rigorous analysis frameworks will be developed to quantitatively evaluate the strength of the privacy protection being achieved, and the theoretical results will be systematically validated through experiments with robot networks. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →