Building Defect Prediction Models in Practice

Building Defect Prediction Models in Practice

Rudolf Ramler (Software Competence Center Hagenberg, Austria), Johannes Himmelbauer (Software Competence Center Hagenberg, Austria) and Thomas Natschläger (Software Competence Center Hagenberg, Austria)
DOI: 10.4018/978-1-4666-6026-7.ch024


The information about which modules of a future version of a software system will be defect-prone is a valuable planning aid for quality managers and testers. Defect prediction promises to indicate these defect-prone modules. In this chapter, building a defect prediction model from data is characterized as an instance of a data-mining task, and key questions and consequences arising when establishing defect prediction in a large software development project are discussed. Special emphasis is put on discussions on how to choose a learning algorithm, select features from different data sources, deal with noise and data quality issues, as well as model evaluation for evolving systems. These discussions are accompanied by insights and experiences gained by projects on data mining and defect prediction in the context of large software systems conducted by the authors over the last couple of years. One of these projects has been selected to serve as an illustrative use case throughout the chapter.
Chapter Preview

Data Mining And Knowledge Discovery For Defect Prediction

Defect prediction is based on prediction models built from software engineering data. Thus, defect prediction can be understood as an application within the broad area of data mining and knowledge discovery which refer to general results of research, techniques and tools used to extract useful information and models from (large volumes of) data (Mariscal et al. 2010).

Key Terms in this Chapter

Prediction Model: A prediction model incorporates various attributes of a software system as independent variables which describe the parts (e.g., components) of a software system and act as predictors for the dependent variables that characterize the defect-proneness of these parts of the system.

Version: A version of a software system relates the system’s implementation and all related artifacts to a specific point in time, usually when an instance of the system under development is released. In this chapter, the term “version” is used interchangeably with the term “release”.

Chance Level: The chance level is the accuracy that will be reached when constantly predicting the majority class. For versions where more defective than defect-free components exist, the chance level equals the share of defective components.

Module: The term (software) module is used as an abstraction for a part of a software system at a defined level of granularity such as classes, files or larger components of the software system.

Defect: A defect subsumes various kinds of faults in a software system. Defects usually manifest themselves in the code but may also be found in specifications, documentations, auxiliary code or systems, etc.

Prediction: A prediction is the anticipation of a status or outcome in the future. The true status or outcome is unknown at the time when the prediction is made and can only be estimated with a certain degree of uncertainty.

Complete Chapter List

Search this Book: