SHF: Small: Collaborative Research:Text Retrieval in Software Engineering 2.0
Florida State University, Tallahassee FL
Investigators
Abstract
Software systems contain large amounts of textual information captured in various software artifacts, such as, requirements documents, source code, user manuals, etc. The productivity of software developers and the quality of the software they produce directly depends on their ability to retrieve and understand the textual information present in software. Since humans cannot process and comprehend so much text, researchers proposed the use of text retrieval techniques to help software developers with many of their daily tasks. In order to be useful, these techniques need to be properly configured, which requires calibrating many parameters. As most software developers are not experts in text retrieval, they need help in determining the best text retrieval configuration in a given software engineering context. The configuration problem is one of the main obstacles in the adoption of such techniques in the software industry, because many approaches proposed by researchers do not generalize well. The outcomes of this project will transform the way software developers address many of their daily tasks, allowing them to easily adopt the use of text retrieval during software development. The results of this research will also be used in software engineering courses to support students in their projects. The new practices that the students will acquire will help them become better software engineers. The proposed research also brings together work from different computing research communities: software engineering and information retrieval and it will bring new knowledge in both fields. Existing approaches using text retrieval in software engineering will become more practical, rather than just promising, facilitating migration from the lab into industry and academia. The outcome of this research will be: (1) a novel approach (called TRinSE2.0), which will achieve automatic, runtime query-based text retrieval configuration; and (2) improvements to important software engineering tasks, in practical settings, focusing on feature and bug location, impact analysis, traceability link recovery, and bug triage. TRinSE2.0 will be evaluated on open source data, in the classroom, and in industrial settings. The proposed work will transform the way text retrieval configuration is done in software engineering applications. New, software-specific measures, as well as proven linguistic-based measures will be used to capture query properties in the context of software engineering tasks and data sets. Machine learning algorithms will find the best configuration for a given query. When writing a query to retrieve information from a software project, developers will get the best results, saving them time and effort, improving their productivity and the quality of their work. The text retrieval configuration problem will no longer be heuristic-based, but it will become data-driven.
View original record on NSF Award Search →