Collaborative Research: IMR: MM-1B: Automating Privacy-Preserving Data Sharing of Campus Network Traffic Logs
Virginia Polytechnic Institute And State University, Blacksburg VA
Investigators
Abstract
Internet measurement data serves a critical role in enabling innovations in fields such as networking and cyber security monitoring. However, such data often contains sensitive user information, which cannot be shared publicly and is only accessible to a limited number of trusted researchers with established collaborations with data owners. This limitation hinders the reproducibility of research results and the use of collected data by the research community. This project develops a systematic approach to facilitate data sharing of campus network traffic logs from the University of Virginia (UVA) and Virginia Tech (VT) with the broader research community. The project’s novelties are (i) data anonymization: automating the discovery and anonymization of sensitive data in network traffic logs and (ii) data sharing: modularizing the data access through a privacy-preserving framework. The project's broader significance and importance are (i) expanding the number of researchers who can use the campus network traffic logs from UVA and VT, as well as serving as a pointer for sharing Internet measurement data by other campuses and networks; (ii) curriculum development to engage graduate and undergraduate students for research activities on the campus network logs. Specifically, the project (i) develops an extensible and adaptable model to identify Personal Identifiable Information (PII) to anonymize in structured and semi-structured network logs in an offline setting, fine tune the model based on historical logs, and transition the model to a tunable service capable of supporting live data feeds and offline repositories; (ii) modularizes varying levels of data access based on research needs and constructs a privacy-preserving framework with differential privacy to provide provable privacy guarantees to restricted levels of data access; and (iii) integrates the privacy-preserving framework back into data anonymization to achieve dynamic anonymization based on usage patterns. This project deploys and evaluates the proposed approach directly on the UVA and VT traffic collection pipeline in a practical real-world setting. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →