CRII: SaTC: Robust Explainable Provenance-based Intrusion Detection
Wake Forest University, Winston Salem NC
Investigators
Abstract
Modern intrusion detection systems detect ongoing cyberattacks based on the knowledge of a computer system’s activity history, also known as data provenance. However, deploying them in the real world is challenging. The project’s novelties are the development of a next generation intrusion detection system that not only accurately identifies an intrusion but precisely diagnoses its root cause and method of attack, even in the presence of an attacker who actively tries to evade detection. The project’s broader significance and importance are (1) delivering timely solutions to real-world cybersecurity threats increasingly faced by both the U.S. government and large corporations that affect the security and privacy of millions, (2) fostering a collaborative security community by organizing a workshop on building and disseminating reproducible intrusion detection experiments, and (3) training first-generation and LGBTQ students to improve the representation of members from underrepresented groups in the security workforce. The project addresses three challenges that imperil the efficacy and practical adoption of provenance-based intrusion detection systems. First, they cannot explain precisely the cause and progression of an attack. Second, they are ineffective when an attacker purposefully tries to evade them. Third, they require an abundance of provenance data that is difficult to obtain. To address these problems, this project designs a novel intrusion detection system that leverages machine learning to highlight anomalous computer activity indicating an intrusion. The machine learning algorithm focuses on making sense of the attack to reduce manual effort to triage intrusion alerts. To expose the shortcomings of existing intrusion detection systems, the project first studies new intrusion strategies that simultaneously attack a host computer system and evade detection. A counter measure technique that introduces randomness and strengthens robustness in learning is then incorporated into the machine learning pipeline to mitigate such attacks. Finally, the project leverages software engineering and program analysis techniques to synthesize benign provenance data to train the intrusion detection system and integrates these techniques into an automated framework to facilitate data generation. A successful project will advance state-of-the-art endpoint intrusion detection and response solutions, improve analyst experience, and enhance the security of cyberinfrastructure critical to the government and other organizations. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →