SHF: Small: Causal Foundations of Statistical Fault Localization

$497,500FY2015CSENSF

Case Western Reserve University, Cleveland OH

Investigators

Abstract

The goal of this research is to improve the effectiveness of automated techniques that seek to locate the faults in software that caused observed failures (malfunctions) to occur during testing or operational use, so the faults can be repaired. This goal is important because properly functioning software is critical in business, communications, national security, transportation, science, and many other activities. The desired improvements are to be achieved by employing methodology that has been developed recently, across several disciplines, to enable the causal effects of various kinds of treatments, exposures, or interventions (e.g., medical treatments) upon outcomes of interest (e.g., diseases) to be estimated accurately and without bias. If successful, the proposed research has the potential to help software developers to efficiently localize and repair faults in their products, thereby preventing harms such as economic loss, injury, and even death. The research will also help to disseminate sound causal inference methodology in the software engineering community. More specifically, the research will investigate and improve the foundations of causal statistical fault localization (CSFL), including the form of causal models, the abstraction of causal states, and the handling of iteration. A value-based approach to CSFL will be developed, which involves profiling and analyzing the values of program variables, and this will be integrated with predicate-based CSFL, in order to more accurately estimate the failure-causing effects of program elements. A new approach to CSFL will be explored that employs multilevel statistical models to integrate execution data of different types and granularity levels, both for a given program version and across versions. Meta-analysis techniques will be applied to the set of suspiciousness scores obtained with CSFL, in order to take account of features of the score distribution and of factors that predict the credibility if individual scores. Also to be investigated is how the problem of selection bias affects SFL techniques in different settings and how it can be mitigated. Finally, the research will explore the potential value of case-control methodology for improving the cost-effectiveness of SFL in scenarios where software failures are infrequent.

View original record on NSF Award Search →