Increasing the Accuracy of Software Fault Prediction using Majority Ranking Fuzzy Clustering

Increasing the Accuracy of Software Fault Prediction using Majority Ranking Fuzzy Clustering

Golnoush Abaei (University Technology Malaysia, Johor, Malaysia) and Ali Selamat (UTM-IRDA Digital Media Centre, K-Economy Research Alliance UTM & University Technology Malaysia, Johor, Malaysia)
Copyright: © 2014 |Pages: 12
DOI: 10.4018/ijsi.2014100105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Despite proposing many software fault prediction models, this area has yet to be explored as still there is a room for stable and consistent model with better performance. In this paper, a new method is proposed to increase the accuracy of fault prediction based on the notion of fuzzy clustering and majority ranking. The authors investigated the effect of irrelevant and inconsistent modules on software fault prediction and tried to decrease it by designing a new framework, in which the entire project modules are clustered. The obtained results showed that fuzzy clustering could decrease the negative effect of irrelevant modules on prediction performance. Eight data sets from NASA and Turkish white-goods software is employed to evaluate our model. Performance evaluation in terms of false positive rate, false negative rate, and overall error showed the superiority of our model compared to other predicting models. The authors proposed majority ranking fuzzy clustering approach showed between 3% to 18% and 1% to 4% improvement in false negative rate and overall error, respectively, compared with other available proposed models (ACF and ACN) in more than half of the testing cases. According to the results, our systems can be used to guide testing effort by identifying fault prone modules to improve the quality of software development and software testing in a limited time and budget.
Article Preview

Introduction

Since software projects play major role in today’s industry, the accurate estimation of the software development cost is very crucial. According to the Standish group report (2009), just 32% of software projects were on time and on cost in 2009, 44% of the projects were in challenged mode and 24% of projects had been canceled. Designing, developing, testing, and all aspects of the software projects are affected by the relevant estimations and predictions. Software testing is known as a major factor in increasing the development cost. Faulty modules cause significant risk by decreasing customer fulfillment and by increasing the testing and maintenance costs. Early detection of fault-prone software components could enable verification experts and testers to concentrate their time and resources on the problematic areas of the system under development. Area of software fault prediction still poses many challenges and unfortunately, none of the techniques proposed within the last decade have achieved widespread applicability in the software industry. During the recent decade, many software fault prediction models have been proposed; however, selecting the best method among them seems to be impossible because the performance of each method depends on the various factors such as different software measurement metrics, available information, machine-learning techniques and so on. However, the main aim of all methods is presenting the accurate results.

Soft computing methods have recently become popular in all prediction areas. It is a field within computer science that is characterized by using inexact solutions. Soft computing differs from conventional (hard) computing in that, unlike hard computing, soft computing deals with imprecision, uncertainty, partial truth, and approximation to achieve practicability, robustness, and low solution cost (Zadeh, 1965). Components of soft computing include neural networks, support vector machines, fuzzy logics, evolutionary computation and so on.

In the fault prediction process, previous reported faulty data along with distinct metrics identify the fault-prone modules. However, outliers and irrelevant data in training set can lead to the imprecise prediction. In fact, in many engineering problems, we encounter vagueness in information and uncertainty in training sets, so as these phenomena cause, we could not reach to expected results for our proposed solution. Our system, models the input information’s vagueness through fuzzy clusters and fault prediction is done based on majority ranking of three most similar fuzzy clusters to the test data. This system provides more accurate results compared to existing methods based on different classification techniques. Based on our proposed model, we construct three research questions that are listed as follows:

  • RQ1: Does fuzzy clustering with majority ranking perform better than two well-performed learning methods in fault prediction modeling namely naïve bayes and random forest?

  • RQ2: Does fuzzy clustering with majority ranking perform better than two well-performed learning methods in fault prediction modeling namely naïve bayes and random forest when two-stage outlier removal is applied on data sets?

  • RQ3: How our proposed model is performed when two different sets of data sets are used for training and testing process?

The remainder of this paper continues with section 2, where a brief discussion on related works is presented. Fuzzy clustering is reviewed in section 3. Section 4 contains our proposed method. Experimental descriptions are presented in section 5. Experimental results and analysis are described in section 6, and finally, we summarize this paper in section 7.

According to Catal (2011), software fault prediction became one of the noteworthy research topics since 1990 and it includes two recent and comprehensive systematic literature reviews (Catal & Diri, 2009b; Hall, Beecham, Bowes, Gray, & Counsell, 2012). The prediction techniques use approaches that originated from the field of either statistics or machine learning. Some of these techniques are decision trees (Koprinska, Poon, Clark, & Chan, 2007) neural network (Thwin & Quah, 2005), naïve bayes (Menzies, Greenwald, & Frank, 2007), fuzzy logic (Yuan, Khoshgoftaar, Allen, & Ganesan, 2000) and the artificial immune recognition system algorithms in (Catal & Diri, 2007a, 2007b, 2009a). As the number of related works in this area is too much, we present some of them in this section.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing