Investigation of Software Reliability Prediction Using Statistical and Machine Learning Methods

Investigation of Software Reliability Prediction Using Statistical and Machine Learning Methods

Pradeep Kumar, Abdul Wahid
Copyright: © 2017 |Pages: 21
DOI: 10.4018/978-1-5225-2229-4.ch012
(Individual Chapters)
No Current Special Offers


Software reliability is a statistical measure of how well software operates with respect to its requirements. There are two related software engineering research issues about reliability requirements. The first issue is achieving the necessary reliability, i.e., choosing and employing appropriate software engineering techniques in system design and implementation. The second issue is the assessment of reliability as a method of assurance that precedes system deployment. In past few years, various software reliability models have been introduced. These models have been developed in response to the need of software engineers, system engineers and managers to quantify the concept of software reliability. This chapter investigates performance of some classical and intelligent machine learning techniques such as Linear regression (LR), Radial basis function network (RBFN), Generalized regression neural network (GRNN), Support vector machine (SVM), to predict software reliability. The effectiveness of LR and machine learning methods is demonstrated with the help of sixteen datasets taken from Data & Analysis Centre for Software (DACS). Two performance measures, root mean squared error (RMSE) and mean absolute percentage error (MAPE) is compared quantitatively obtained from rigorous experiments.
Chapter Preview


Software reliability modeling has gained a lot of importance in many critical and daily life applications, which has led to the tremendous work being carried out in software reliability engineering. Software reliability growth models (SRGMs) successfully have been used for estimation and prediction of the number of errors remaining in the software. The software practitioners and potential users can assess the current and future reliability through testing using these SRGMs. Many analytical models such as times-between-failures model, nonhomogeneous Poisson process (NHPP) model, Markov processes and operational profile model has been proposed in past four decades for software reliability prediction. The two broad categories of SRGMs include parametric models and non-parametric models. Most of the parametric SRGM models are based on NHPP which has been widely used successfully in practice. The non-parametric SRGM models based on machine learning are more flexible which can predict reliability metrics such as cumulative failures detected, failure rate, time between failures, next time to failures. Both parametric and non-parametric models can be used to estimate the current reliability measures and predict their future trends. Therefore, SRGMs can be used as mathematical tools for measuring, assessing and predicting software reliability quantitatively.

Despite the application of various machine learning methods in past few decades, non-homogeneous Poisson process (NHPP) based models has remained one of the most attractive reliability growth models in monitoring and tracking reliability improvement. However, due to their hard-core assumptions, validity and relevance in the real-world scenario have limited their usefulness. On the other hand, learning and generalization capability of artificial neural networks (ANNs), and its proven successes in complex problem solutions has made it, a viable alternative for predicting software failures in the testing phase. The main advantages of ANNs over NHPP based models is that it requires only failure history as inputs and no assumptions, or a priori postulation of parametric models is required. Several regression techniques such as linear regression and machine learning methods (DTs, ANNs, SVMs, GA) have been proposed in literature for predicting software reliability. The major challenges of these models do not lie in their technical soundness, but their validity and applicability in real world projects in particular to modern computing system. Linear regression (LR) is the most widely used method and easily understood but it hardly works well on real-life data. Since, LR is restricted to fitting straight line functions to data and hence not suited well for modeling non-linear functions. Some empirical studies based on multivariate linear regression and neural network methods have been carried out for prediction of software reliability growth trends.

However, multivariate linear regression method can address linear relationship but require large sample size and more independent variables. The use of support vector machine (SVM) approach in place of classical techniques has shown a remarkable improvement in the prediction of software reliability in the recent years. The design of SVM is based on the extraction of a subset of the training data that serves as support vectors and therefore represents a stable characteristic of the data. GRNN-based reliability prediction model incorporating the test coverage information such as blocks and branches is applied for software reliability prediction. The prediction accuracy of software reliability models can be further improved by adding other important factors affecting the final software quality such as historical information from software development like capability of developers, testing effort and test coverage. SVM represent the state of the art due to their generalization performance, ease of usability and rigorous theoretical foundations that practically can be used for regression solving problems.

Key Terms in this Chapter

Basic Failure Intensity: Failure intensity that would exist at start of system test for new operations for a project without reviews (requirement, design, or code) or fault tolerance.

Debugging: The process of detection, location, and correction of errors or bugs in hardware or software systems.

Failure Category: The set of failures that have the same kind of impact on users such as safety or security.

Failure Density: At any point in the life of a system, the incremental change in the number of failures per associated incremental change in time.

Data: The representation of facts or instructions in a manner suitable for processing by computers or analyzing by human.

Failure Rate: At a particular time, the rate of change of the number of units that have failed divided by the number of units surviving.

Client-Server Computing: Processing capability or available information distributed across multiple nodes.

Estimation: Determination of software reliability model parameters and quantities from failure data.

Execution Time: The time a processor(s) is / is executing non-filler operations in execution hour.

Program: A set of complete instructions (operators with operands specified) that executes within a single computer and relates to the accomplishment of some major function.

Failure Intensity: Failures per time unit, is an alternative way of expressing reliability.

Software Failure: A failure that occurs when the user perceives that the software has ceased to deliver the expected result with respect to the specification input values. The user may need to identify the severity of the levels of failures such as catastrophic, critical, major or minor, depending on their impact on the systems.

Error: Incorrect or missing action by a person or persons that causes a fault in a program. Error may be a syntax error or misunderstanding of specifications, or logical errors. An error may lead to one or more faults.

Developer: A person or an individual or team assigned a particular task.

Availability: The probability that a system or a capability of a system is functional at a given time in a specified environment or the fraction of time during which a system is functioning acceptably.

Constant Failure Rate: The period during which failures of some units occur at an approximately uniform rate.

Developed Code: New or modified executable delivered instructions.

Prediction: The determination of software reliability model parameters and quantities from characteristics of the software product and development process.

Corrective Action: A documented design process or materials changes implemented and validated to correct the cause of a failure.

Software Fault: An error that leads to a software fault. Software faults can remain undetected until software failure results.

Errors: Human actions that result in the software containing a fault. Examples of such faults are the omission or misinterpretation of the user’s requirements, a coding error etc.

Client: A node that makes request of services in a network or that uses resources available through the servers.

Fault: Defect in system that causes a failure when executed. A software fault is a defect in the code. Thus, a fault is the representation of an error, where representation is the mode of expression such as narrative text, data flow diagrams, Entity-Relationships diagrams, or source code. Moreover, a fault may lead to many failures. That is, a particular fault may cause different failures depending on how it has been exercised.

Correlation: A statistical technique that determines the relationship between two variables (dependent and independent).

Product: A software system that is sold to the customers.

Reliability: Reliability is the probability or the capability of a system that will continue to function without failure for a specified period in a specified environment. The period may be specified in natural or time units.

Software Error: An error made by a programmer or designer, e.g., a typographical error, an incorrect numerical value, an omission, etc.

Failure Time: Accumulated elapsed time at which a failure occurs.

Deviation: Any departure of system behavior in execution from expected behavior.

Bugs: The mistakes committed by the developers while coding the program(s).

Software Engineering: A systematic approach to the development and maintenance of software that begins with analysis of the software’s goals of purposes.

Complete Chapter List

Search this Book: