EAGER: Efficient Privacy-aware Document Search in the Cloud
University Of California-Santa Barbara, Santa Barbara CA
Investigators
Abstract
As sensitive information is increasingly stored in the cloud, privacy protection is a critical factor for users to adopt cloud-based information services such as document search. A cloud server can observe the client-initiated query processing flow, extract statistical patterns, and reason about client's data. As a result, the risk of leakage-abuse attacks exists when searching in the cloud. The main challenge to perform privacy-preserving search is that index visitation can reveal sensitive data patterns, and computation involved in advanced ranking can further expose private feature information. On the other hand, hiding index and feature information through full encryption prevents the server from performing effective scoring and result comparison. This project explores the challenging open problems in algorithmic indexing and ranking solutions for privacy-aware cloud data search. The approach emphasizes an evaluation-driven design where search performance is assessed in multiple aspects of relevance, efficiency, and privacy for practical system deployment. The project integrates the proposed research with an educational plan including undergraduate and graduate students' involvement in the research project, instructional material development, and outreach activities. The exploratory research addresses two fundamental research challenges: (1) privacy-aware indexing and runtime support in matching documents for a given query with an emphasis to curtail statistical text information leakage while providing efficient and private access of ranking features; (2) privacy-aware end-to-end top-K ranking with a multi-stage scheme which seeks a combination of linear and nonlinear methods such as neural nets and learning ensembles. The design goal is to minimize the leakage of document features and characteristics while still accomplishing a reasonable response time and competitive relevance. The evaluation process will use public datasets to assess the effectiveness of the developed techniques for practical system deployment. This research effort will open the door for bridging the gap between privacy and advanced information retrieval in searching large encrypted datasets. The developed research results will be made public for research and industry communities. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →