OAC Core: Small: Devising Data-driven Methodologies by Employing Large-scale Empirical Data to Fingerprint, Attribute, Remediate and Analyze Internet-scale IoT Maliciousness

$496,898FY2019CSENSF

University Of Texas At San Antonio, San Antonio TX

Investigators

Abstract

At least 20 billion devices will be connected to the Internet by 2023. Many of these devices transmit critical and sensitive system and personal data in real-time. Collectively known as "the Internet of Things" (IoT), this market represents a $267 billion per year industry. As valuable as this market is, security spending on the sector barely breaks 1%. Indeed, while IoT vendors continue to push more IoT devices to market, the security of these devices has often fallen in priority, making them easier to exploit. This drastically threatens the privacy of the consumers and the safety of mission-critical systems. While a number of research endeavors are currently taking place to address the IoT security problem, several challenges hinder their success. These include the lack of IoT monitoring capabilities once such devices are deployed, the shortage of remediation techniques when they are compromised, and the inadequacy of methodologies to permit the comprehension of the underlying IoT malicious infrastructures. To this end, this project will serve NSF's mission to promote the progress of science by developing data science methodologies to identify and remediate infected IoT devices in near real-time. The project will also promote cyber security research and training for minorities and K-12 students. Moreover, the project will contribute to operational cyber security by developing a large-scale cyberinfrastructure for IoT-relevant data and threat sharing, enabling hands-on cyber-science at large. The project will scrutinize close to 100 GB/hr of real-time unsolicited Internet-scale traffic to devise and develop efficient deep learning classifiers to fingerprint IoT devices, identifying their types and vendors, and disclosing their large-scale vulnerabilities and hosting environments. The project will design and develop fast greedy approximation algorithms for L1-norm Principal Component Analysis (PCA) data-dimensionality reduction, enabling the real-time execution of the Density Based Spatial Clustering of Application with Noise (DBSCAN) technique for detecting and attributing IoT orchestrated botnets. The project will also design scalable offensive security algorithms based on Internet-wide active measurements to offer macroscopic remediation strategies. The project will curate close to 3.5 million malware samples/day and around 1.3 million passive DNS records/day to build graph-theoretic models to uncover and characterize inter-related components which form the concept of IoT malicious cyberinfrastructure. Further, the project will analyze the evolution of such infrastructures to comprehend their modus operandi by devising efficiency graph similarity techniques in linear time, by designing and implementing algorithms rooted in graph kernels and min-hashing methods. The project will also (i) develop a unique cyberinfrastructure for IoT empirical data and cyber threat indexing and sharing, (ii) automate the devised algorithms and techniques by leveraging high speed, in-memory data processing technologies, (iii) generate IoT-specific detection signatures by exploring fuzzy hashing algorithms, and (iv) enable at-large access to the generated IoT artifacts through a secure API and a front-end mechanism. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →