Collaborative Research: Privacy-Sensitive Data Mining From Multi-Party Distributed Data

$164,203FY2003CSENSF

Washington State University, Pullman WA

Investigators

Abstract

Collaborative Research: Privacy-Sensitive Data Mining from Multi-Party Distributed Data A growing number of data mining applications need to deal with data sources that are distributed, possibly proprietary, and sensitive to privacy. This has resulted in the development of several privacy-preserving data mining techniques. Many of these algorithms work using randomized techniques to perturb the data and preserve the data privacy while still guaranteeing the invariance of the underlying patterns. This proposal first points out that the popular naive randomization of the data (more specifically, additive random matrices) may preserve little data privacy in many cases. It proposes a framework to understand these data masking techniques using the theory of random matrices to shows the problems of some existing privacy-preserving data mining techniques and potential research directions for solving the problems. This proposal also suggests the development of a collection of randomized techniques that are provably correct in supporting privacy-sensitive applications. It proposes a randomized projection-based technique to compute statistical aggregates from multi-party distributed data. It also suggests research on a privacy-sensitive distributed Bayesian network learning algorithm for similar applications. The proposed algorithms will be developed, tested, and evaluated using a collection of testbeds that the PIs have been developing for the last several years.

View original record on NSF Award Search →