CRI: CI-New: Collaborative Research: NJR: A Normalized Java Resource
University Of California-Irvine, Irvine CA
Investigators
Abstract
Research in programming languages and software engineering has increasingly become a "big data" science ("Big Code") is which researchers want to use large code bases to experiment with new techniques. Specifically, Big Code will enable novel tools in areas such as security enhancers, bug finders, and code synthesizers. This project will build a community resource of 100,000 executable Java programs together with a set of working tools and an environment for conducting such research. This Normalized Java Resource (NJR) will lower the barrier to implementation of new tools, speed up research, and ultimately help advance research frontiers. Additionally, NJR can be the foundation of new courses on software tools that take advantage of Big Code. Finally, NJR can be the centerpiece of a discussion about better benchmark suites in general. The two investigators will work with collaborators from five countries. Researchers get significant advantages from using NJR. They can write scripts that base their new tool on NJR's already-working tools, and they can search NJR for programs with desired characteristics. They will receive the search result as a container that they can run either locally or on a cloud service. Additionally, they benefit from NJR's normalized representation of each Java program, which enables scalable running of tools on the entire collection. Finally, they will find that NJR's collection of programs is diverse because of the investigators' efforts to run clone detection and near-duplicate removal. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →