Article Preview
Top1. Introduction
Identifying and fixing bugs is one of the difficult and time-consuming tasks in software development lifecycle (SDLC). Our failure to manage the complexity and identify the bug prone modules results in the projects that are late, over budget and deficient in their implicit and stated quality requirements. One promising approach for detecting bugs in software before release is to use defect prediction techniques. The prediction techniques intend to predict bug score (probability of bugs, number of bugs) of different modules of the software. The aim is to get a ranked/prioritized list of different modules so that bug detection and fixing effort can be allocated in an optimal manner.
The bug prediction problem has been researched thoroughly (Bernstein et al., 2007; D’Ambros et al., 2010; El Emam et al., 2001; Graves et al., 2000; Gyimothy et al., 2005; Herzig, 2014; Kim et al., 2007; Nagappan & Ball, 2005b; Ratzinger et al., 2008). Using historical data (data extracted from Software Change Metric (SCM) and/or bug tracking systems like CVS/Subversion, Bugzilla/Jira) of the project, the task is to predict the bug score of the software modules in future releases. In the past, a variety of models have been designed to tackle the problem, relying on diverse information, such as code metrics (Basili et al., 1996; Hassan, 2009; Nagappan et al., 2006; Nagappan & Ball, 2005a; Zimmermann et al., 2007) (lines of code, response for class, coupling between objects, complexity), process metrics (Di Nucci et al., 2017, 2015; Graves et al., 2000; Mnkandla & Mpofu, 2016; Moser et al., 2008; Rahman & Devanbu, 2013; Van Rysselberghe, 2008; Zimmermann et al., 2007) (number of changes, number of refactorings) or past defects (Felix & Lee, 2017; Hassan & Holt, 2005; Joshi et al., 2007; Kim et al., 2007; Zimmermann et al., 2007). The major focus of research mostly has been on proposing new models/techniques to predict bugs rather than on validating performance of existing approaches. Validation of existing approaches is very important before their results are generalized and used in practice. This becomes more important when some method/technique claims to outperform any or all of the usual established problem-solving approaches in a problem domain.