Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Misha Kakkar, Sarika Jain, Abhay Bansal, P.S. Grover

Source Title: International Journal of Open Source Software and Processes (IJOSSP) 9(1)

DOI: 10.4018/IJOSSP.2018010101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Software Defect Prediction (SDP) models are used to predict, whether software is clean or buggy using the historical data collected from various software repositories. The data collected from such repositories may contain some missing values. In order to estimate missing values, imputation techniques are used, which utilizes the complete observed values in the dataset. The objective of this study is to identify the best-suited imputation technique for handling missing values in SDP dataset. In addition to identifying the imputation technique, the authors have investigated for the most appropriate combination of imputation technique and data preprocessing method for building SDP model. In this study, four combinations of imputation technique and data preprocessing methods are examined using the improved NASA datasets. These combinations are used along with five different machine-learning algorithms to develop models. The performance of these SDP models are then compared using traditional performance indicators. Experiment results show that among different imputation techniques, linear regression gives the most accurate imputed value. The combination of linear regression with correlation based feature selector outperforms all other combinations. To validate the significance of data preprocessing methods with imputation the findings are applied to open source projects. It was concluded that the result is in consistency with the above conclusion.

Article Preview

Top

Introduction

Software Defect Prediction (SDP) models enable quality support teams to predict defect prone software artifacts in advance, which in-turn helps in effective resource allocation and utilization. The development of SDP model starts with data collection from various software repositories. This collected data is expected to be complete, however, it is sometimes incomplete and noisy. This incomplete data may be the result of cross-company nature of dataset or some computational errors (García et al., 2015; Lakshminarayan et al., 1999). The performance of SDP model developed from an incomplete dataset is questionable and also, many machine-learning algorithms won’t be able to process such datasets.

There are many ways to deal with an incomplete dataset like - deletion techniques, toleration techniques, and imputation techniques. Deletion techniques recommend the deletion of all the instances, which include missing values, thus resulting in loss of important data. In toleration techniques, missing values are replaced by mean/ mode values, which is also not the best alternative method. Imputation is the most appropriate technique, which estimates missing values by analyzing the observed/available data. Researchers have proposed various imputation algorithms that are based on the accuracy of classifiers, which are trained using imputed values in the training dataset (Batista and Monard, 2003; Farhangfar et al., 2007; Saar-Tsechansky and Provost, 2007). Most of the imputation algorithms (Ma et al., 2006; Song et al., 2011) are trained under supervised learning, which uses complete dataset as the training dataset to compute missing values in the test dataset. Thus, complete dataset’s quality affects the performance of the imputation technique, which in turn affects the performance of SDP model.

Data preprocessing techniques are used to deal with the other issue related to collected data, i.e. noisy data. Instance selection and feature selection are two significant data preprocessing steps, which aim to eliminate noisy data and reduce the size of data set by filtering out non-relevant software metrics (Gupta, 2013; Gao and Khoshgoftaar, 2014 and García et al., 2015; Kale). Feature Selection methods select most relevant software metrics, which contribute maximum to the prediction process. Instance selection methods select the most relevant instances, which contribute to the prediction process.

In this study, the authors investigate prediction capability of SDP model if either instance selection or feature selection is performed as an additional step in combination with the imputation technique. In other words, we examine which one of the two- instance selection or feature selection is more advantageous in the process of SDP model building with missing value dataset.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 14: 1 Issue (2023)

Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 4 Issues (2016)

Volume 6: 1 Issue (2015)

Volume 5: 3 Issues (2014)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Abstract

Introduction

Complete Article List