SaTC: CORE: Small: Towards Robust and Scalable Search of Binary Code and Data
University Of California-Riverside, Riverside CA
Investigators
Abstract
The problem of binary code and data search concerns how to glean valuable information from binary code and binary data in an accurate, scalable and robust fashion. This concern is central to many security problems, including vulnerability scanning, code plagiarism detection, software lineage, malware classification, memory forensics, virtual machine introspection, malicious document detection, etc. Although this problem is not new and a great deal of solutions have been proposed, no solutions can achieve the requirements of accuracy, scalability and robustness simultaneously. There are bottlenecks for binary code and data search due to the search schemes: pair-wise comparison for binary code search does not scale, and rule-based binary data search is too rigid and thus not robust against changes caused by different platform versions and malicious manipulations. The proposed work takes a novel approach to the problem of binary code and data search, one that mimics how the human brain recognizes interesting objects from an enormous amount of visual information. There are two research thrusts: 1) scalable cross-platform binary code search, which aims to quickly identify semantically equivalent or similar code from a large binary code base in different architectures, by automatically learning high-level features from binary code via clustering and deep learning; and 2) adaptive, efficient and robust binary data analysis, which aims to accurately identify objects from binary data such as memory dumps and documents, by constructing deep neural network models. Because binary code and data search are foundational for many security applications, advances to these foundations can push the boundary for all the security applications built on top. Moreover, successful application of deep learning onto binary code and data search will revolutionize how we solve many security problems in general and stimulate more research in the direction of security by deep learning.
View original record on NSF Award Search →