CAREER: Interpretable Provenance Analysis for Heterogeneous Systems at Scale

$83,594FY2023CSENSF

Rutgers University New Brunswick, New Brunswick NJ

Investigators

Abstract

The number of cybercrime incidents and the complexity of modern attacks are increasing, making forensics analysis more challenging. Provenance analysis is a common practice for forensics analysis tasks that record historical system execution events and convert them into causal graphs following the dependencies among events. Investigators can identify attack root causes and induced damages from such graphs and leverage learned knowledge for attack detection or security enforcement. The project’s novelties are a scalable and interpretable provenance collection software for heterogeneous systems consisting of cutting-edge artificial intelligence components. The project's broader significance and importance are building the foundation for reasoning about the opaqueness of modern complex systems and training research and security analysis skills of students and security professionals. Due to the provenance collection software practicality, the developed techniques improve modern computing systems’ resilience to cyber-attacks. The attack traces generated by this project, including the labeled and cleansed ones, will support further research in multiple areas, such as cyber security and big-data analysis. Specifically, the project develops a scalable provenance collection system by designing a new system architecture that coordinates individual kernel components. It optimizes the provenance storage system by introducing a novel lossless compression schema. On top of these frameworks, the project builds interpretable provenance analysis and on-the-fly attack detection methods through program analysis-enabled semantic labeling and artificial intelligence-based behavior analysis. The training methods provide new capabilities in learning from the highly biased and unlabeled audit data that are at a scale exceeding most existing data-driven applications. The project team also devises novel causality analysis mechanisms for deep neural network modules in emerging heterogeneous systems. This technique improves model interpretability in the presence of an attack, especially for models used in critical missions such as auto-driving, identity recognition, and private property surveillance. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →