Timely and accurate prediction of the quality of software modules in the early stages of the software development life cycle is very important in the field of software reliability engineering. With such predictions, a software quality assurance team can assign the limited quality improvement resources to the needed areas and prevent problems from occurring during system operation. Software metrics-based quality estimation models are tools that can achieve such predictions. They are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes (Khoshgoftaar et al., 2005b), and a quantitative prediction model that estimates the number of faults (or some other quality factor) that are likely to occur in software modules (Ohlsson et al., 1998). In recent years, a variety of techniques have been developed for software quality estimation (Briand et al., 2002; Khoshgoftaar et al., 2002; Ohlsson et al., 1998; Ping et al., 2002), most of which are suited for either prediction or classification, but not for both. For example, logistic regression (Khoshgoftaar & Allen, 1999) can only be used for classification, whereas multiple linear regression (Ohlsson et al., 1998) can only be used for prediction. Some software quality estimation techniques, such as case-based reasoning (Khoshgoftaar & Seliya, 2003), can be used to calibrate both prediction and classification models, however, they require distinct modeling approaches for both types of models. In contrast to such software quality estimation methods, count models such as the Poisson regression model (PRM) and the zero-inflated Poisson (ziP) regression model (Khoshgoftaar et al., 2001) can be applied to yield both with just one modeling approach. Moreover, count models are capable of providing the probability that a module has a given number of faults. Despite the attractiveness of calibrating software quality estimation models with count modeling techniques, we feel that their application in software reliability engineering has been very limited (Khoshgoftaar et al., 2001). This study can be used as a basis for assessing the usefulness of count models for predicting the number of faults and quality-based class of software modules.
Software Metrics and Software Quality Modeling
Software product and process metrics are essential in the software development process. With metrics, the software development team is able to evaluate, understand, monitor and control a software product or its development process from original specifications all the way up to implementation and customer usage.
In the software reliability engineering literature, the relationship between software complexity metrics and the occurrence of faults in program modules has been used by various metrics-based software quality estimation models, such as case-based reasoning (Khoshgoftaar & Seliya, 2003), regression trees (Khoshgoftaar et al., 2002), fuzzy logic (Xu et al., 2000), genetic programming (Liu & Khoshgoftaar, 2001) and multiple linear regression (Ohlsson et al., 1998). Typically, a software quality model for a given software system is calibrated using the software metrics and fault data collected from a previous system release or similar project. The trained model can then be applied to predict the software quality of a currently under-development release or comparable project. Subsequently, the resources allocated for software quality improvement initiatives can then be targeted toward program modules that are of low quality or are likely to have many faults.