CICI: RDP: Enforcing Security and Privacy Policies to Protect Research Data

$924,503FY2019CSENSF

University Of Virginia Main Campus, Charlottesville VA

Investigators

Abstract

Advances in computer systems over the past decade have laid a solid foundation for data collection at a staggering scale. Data generated from end-user devices has tremendous value to the research community. For example, mobile and Internet-of-Things devices can participate in large-scale Internet-based measurement or monitoring of patient's health conditions. While ground-breaking discovered may occur, malicious attacks or unintentional data leaks threaten the research data. Such a threat is hard to predict and difficult to recover from once it happens. Preventative and defensive measures should be taken where data is generated in order to protect private, valuable data from the attackers. Currently, there are efforts that try to regulate data management, for example, a research application might have a privacy policy that describes how the user data is being collected and protected. However, there is a disconnect between these documented policies and the implementations of a research project. In this project, the investigators propose to interpret the documented policies and enforce them in research projects, in order to protect the privacy of research data. This work can significantly reduce researchers' overhead in implementing policy-compliant code and reduce the complexity of protecting research datasets. In this project, the investigators provide a solution that protects research data using policies mandated by different regulatory entities, such as an application store and an Institutional Review Board (IRB). The system utilizes Natural Language Processing (NLP) techniques to extract security and privacy requirements from unstructured regulatory documents and translates these requirements to code that can patch a program that does not comply with the policies. The solution covers the lifetime of research data protection, from data collection to data storage, and data processing. This research has two thrusts. First, the investigators will build novel NLP techniques to extract security and privacy policies from unstructured, sparsely-labeled documents such as IRB protocols, and privacy disclosure of research applications. Second, the investigators will enforce these extracted policies in code, through context-aware program analysis to discover inconsistencies between a researcher's implementation and the extracted policies, and instrument researcher?s code to enforce compliant program behavior. The results of this work will have a transformative impact on the development of the next generation research data protection techniques, and more defensive security and privacy practices. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →