Classification of Bug Injected and Fixed Changes Using a Text Discriminator

Classification of Bug Injected and Fixed Changes Using a Text Discriminator

Akihisa Yamada (Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan) and Osamu Mizuno (Graduate School of Science and Technology, Kyoto Institute of Technology, Kyoto, Japan)
Copyright: © 2015 |Pages: 13
DOI: 10.4018/ijsi.2015010104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Approaches to detect fault-prone modules have been studied for a long time. As one of these approaches, the authors proposed a technique using a text filtering technique. They assume that bugs relate to words and context that are contained in a software module. The technique treats a module as text information. Based on the dictionary which was learned by classifying modules which induce bugs, the bug inducing probability over a target module is calculated, and it judges whether the given module is a fault-prone module. The predictive granularity of this technique is a module. In this study, the authors aimed at prediction with the finer granularity of the portion that induces a bug. Specifically, they tried to predict bug-inducing changes by using source code differences of bug inducing changes and previous changes and a text filtering technique. Similarly, the authors tried to predict bug fixing by using source code differences of bug fixing changes and previous changes and a text filtering technique. To show the effectiveness of the approach, the authors conducted two experiments and compared their approach with fault-prone filtering by applying it to two open source projects, and obtained higher accuracy.
Article Preview

Fault-prone prediction is a mature area in software engineering with various studies having been done over the past 20 years. From 1999, for example, many studies have been conducted.

Software metrics related to program attributes such as lines of code, complexity, frequency of modification, coherency, coupling, etc., have been used in many previous studies. In those studies, such metrics are considered explanatory variables and fault-proneness is considered an objective variable. Mathematical models are constructed from those metrics. The selection of metrics varies according to studies. For example, studies such as Guo, Cukic, and Singh (2003), Menzies, Greenwald, and Frank (2007), and Seliya, Khoshgoftaar, and Zhong (2005) used NASA’s Metrics Data Program. Object oriented metrics are used in Briand, Melo, and Wust (2002), for example. Some studies used metrics based on metrics collection tools (Bellini, Bruno, Nesi, & Rogai, 2005; Denaro, & Pezze, 2002).

The selection of classification techniques also varies according to studies. Khoshgoftaar et al. performed a series of fault-prone prediction studies using various classification techniques; for example, the classification and regression trees (Khoshgoftaar, Shan, & Allen, 2000), the tree based classification with S-PLUS (Khoshgoftaar, Allen, & Deng, 2002), the Treedisc algorithm (Khoshgoftaar, & Allen, 2001), the Sprint-Sliq algorithm (Khoshgoftaar, & Seliya, 2002), and logistic regression (Khoshgoftaar, & Allen, 1999). The comparison was summarized in Khoshgoftaar and Seliya (2004). Logistic regression is a frequently used technique in fault-prone prediction (Briand et al., 2002; Denaro, & Pezze, 2002; Khoshgoftaar & Allen, 1999). Menzies et al. (2007) compared three classification techniques and reported that the naive Bayesian classifier achieved the best accuracy.

Prediction of bugs by using change history of version control system has been widely studied so far. For example, there are studies by Nagappan and Ball (2005), by Kim et al. (Kim, Pan, & Whitehead, 2006; Kim, Zimmermann, Whitehead, & Zeller, 2007), and so on.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing