Improved Hybrid Sampling Strategy for Software Defect Prediction of Imbalanced Data Distribution

Improved Hybrid Sampling Strategy for Software Defect Prediction of Imbalanced Data Distribution

K. Nitalaksheswara Rao
DOI: 10.4018/978-1-6684-4225-8.ch013
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Software defect prediction using data mining techniques is one of the best practices for finding defective modules. The existing classification techniques can be used for efficient knowledge discovery on normal datasets. Most of the real-world data sources are biased towards any one of the classes. This type of data source is known as class imbalance or skewed data sources. The defect prediction rate for the class imbalance datasets reduces with the increases in the class imbalance nature. To handle such type of datasets, an approach with specific designing technique is required for improved performance. In this chapter, the authors propose an algorithm known as improved integrated sampling strategy (IISS) for improved performance using noisy removal strategy for software defect prediction. The experimental analysis conducted on skewed software defect prediction datasets provides the results that IISS algorithm have performed well when compared with C4.5, C4.5+Balance, RF, and RF+Balance algorithms with various class imbalance evaluation measures.
Chapter Preview
Top

Introduction

Software engineering is the process of building software with the desired properties of the user. The complete process of software engineering consists of different phases such as requirement analysis, designing, coding and testing. The complete or exhaustive testing for finding all the errors in the software modules is a tedious job.

This research uses a unique strategy for replicating and generating instances in the minority subset and at the same time reducing the instances from majority subset. The proposed technique is known as improved integrated sampling strategy (IISS) as it integrates both sampling strategies in a single method. This rationale behind combining both the strategies is to address the issues of both majority and minority subsets. The task of combing these strategies in the single class is a challenging task as the counter effects need to be properly under taken for consideration of the learning process for class imbalance problem of software defect prediction.

Figure 1.

A common software defect prediction process

978-1-6684-4225-8.ch013.f01

A common method for software defect prediction of class imbalance nature, need to be very accurate and precise, in spite of very less number of defective module instances. There by developing such a model is ineffective in the practical implementation due to a very high Imbalance ratio. In this study, we propose to use correlation based oversampling, instance ranges specific under sampling strategy and Improved integrated sampling techniques to help improve both majority and minority sub sets. The main rationale behind the approach is feature to feature correlation index and feature to class correlation index in the implementation of improved correlation based over sampling algorithm to learn range of instance. The proposals are supported with sound experimental setup for effective evaluation of class imbalance software defect datasets significantly improves classification over a decision tree as baseline.

The recent research in software defect prediction learning has not laid much stress to consider the software defect prediction as an efficient implementation in all the scenarios (Zi,Li, 2018) . The software defect prediction is also considered in the class balance framework where all the class are regard as equally. The main focus of our research is to overcome the issues with high imbalance ratio scenario in the knowledge discovery process of software defect prediction. The proposal, Improved Integrated Sampling Strategy (IISS) is well capable of handling effectively the process of knowledge discovery from the skewed software prediction datasets.

The remaining paper is presented as given: The recent literature on class imbalance learning in connection with software defect prediction is presented in section 2. The detailed problem statement with objectives is presented in section 3. The proposed approach of Integrated Sampling Strategy is presented in the section 4. The proposed approach is presented in detail in section 5. The dataset and evaluation criteria’s are presented in section 6. The proposed algorithm is compared with benchmark algorithms and the details are presented in section 7. In section 8, the conclusion remarks and extension of work for future scope are resented. .

Top

The special purpose databases execution traces analysis is conducted for software development process for quality improvement(Florian,2019). Different inherent properties, strategic learning tools and performance evaluation models are used to conduct a review of software quality analysis (Ayse Tousan 2009). A better paradigm of software testing strategy is presented recently for consumer support and maintaince in the field of durable electronic systems (Hasan,2012).

Complete Chapter List

Search this Book:
Reset