EAGER: Automatic Identification of Bug Description Elements

$200,000FY2018CSENSF

University Of Texas At Dallas, Richardson TX

Investigators

Abstract

When an application does not behave the way it is meant to or as expected by the users, they often communicate the problem via a bug-report, which is then used by developers to identify the problem and fix it. To submit a bug-report, users utilize issue-trackers, which allows them to write in natural language a description of the problem they encountered. One problem in bug reporting is the perception gap that exists between bug reporters and developers. Those who report a bug typically only have functional knowledge of an application, even if they have development experience themselves, whereas the software developers have intimate code-level knowledge. Consequently, information in bug-reports are often incomplete, potentially incorrect, or hard to comprehend, which leads to excessive manual effort spent by developers in trying to identify the real source of the problem. This project aims to automatically analyzing bug descriptions in natural language and identifying parts that correspond to the observed behavior of the application, the expected behavior, and the steps that describe what the user did when encountering the problem. The ability to automatically identify these parts of a bug description is important as it allows further analysis which will determine the quality of the reported information and supports developers in solving the problem. In the long run, this award will lead to a new type of bug reporting system that is able to automatically enable users to better describe the problem behaviors that they notice, and in turn, help developers address software problems more productively. The project will also support defining best practices in bug reporting, to be used by software users across the world. The project combines well-established and highly innovative research solutions from natural language processing, automated discourse analysis, and machine learning. Specifically, the project addresses discourse semantics at statement level, rather than bug report level, and solves the difficult challenge of bug content disambiguation. In addition, it also addresses the problem of identifying relationships between bug description elements, which is essential in supporting future work on automated bug reproduction. The main solution relies on the use of neural networks, which require a substantial amount of manual coding of bug reports. The resulting set of annotated bug reports could be used to support research beyond this project, such as, the translation of natural language test sequences or scenarios into fully automated test cases. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →