Software Defect Prediction Using Machine Learning Techniques

Software Defect Prediction Using Machine Learning Techniques

G. Cauvery (St. Joseph's College of Arts and Science for Women, India), Dhina Suresh (St. Joseph's College of Arts and Science for Women, India), G. Aswini (St. Joseph's College of Arts and Science for Women, India), P. Jayanthi (St. Joseph's College of Arts and Science for Women, India), and K. Kalaiselvi (St. Joseph's College of Arts and Science for Women, India)
Copyright: © 2023 | Pages: 16
DOI: 10.4018/979-8-3693-1301-5.ch010

Abstract

Software defect prediction gives development teams observable results while influencing business outcomes and development flaws. Developers can uncover flaws and plan test activities by anticipating problematic code sections. Early identification depends on the percentage of classifications that make the right prediction. Additionally, software-defective data sets are supported and partially acknowledged because of their vast size. The confusion, precision, recall, identification accuracy, etc., are assessed and compared with the existing schemes in a systematic research analysis. Previous research has employed the weak simulation tool for software analysis, but this study proposes building three machine learning models using linear regression, KNN classifier, and random forest (RF). According to the analytical investigation, the suggested approach will offer more beneficial options for predicting device failures. Moreover, software-defected data sets are supported and at least partially recognized due to their enormous dimension.
Chapter Preview
Top

1. Introduction

Creating a software system nowadays requires careful planning, analysis, implementation, testing, integration, and maintenance (Viswakarma, et al., 2014). The task of a software Engineer is to design a system within a set deadline and budget, which is accomplished during the planning stage (Alajmi & Khan, 2013). We may encounter a few flaws during the development process, such as bad design, faulty logic, improper data processing, etc.; these flaws result in errors that force us to redo the job, driving up the cost of development and maintenance (Chitra, et al., 2018). All of these are to blame for the decline in consumer satisfaction (Garg, et al., 2017). According to this viewpoint, faults are categorized to their severity, and corrective and proactive measures are implemented following the severity determined (Chopra, et al., 2022). Humans have increasingly centered their attention on software-based systems over the past ten years, with software quality being seen as the most important factor in user functionality (Hung & Chakrabarti, 2022).

Subpar results for both commercial and personal apps can be attributed to a lack of software quality, despite the widespread development of application software (Jain, et al., 2023). Defect prediction designs are widely used in industries, and the models they produce help with tasks like fault prediction, effort estimation for software reliability testing, hazard analysis, and more (Jayalakshmi & Ramesh, 2020). A supervised machine-learning forecasting algorithm is fed the predefined training data set (Kayalvizhi & Ramesh, 2020). The algorithm then produces rules based on what it has learned from the training dataset in order to predict the class label for a new data set (Tiwari, et al., 2018). Mathematical approaches are utilised during the learning phases to develop and enhance the prediction function (Khan, 2016). This technique employs training data with a known attribute input value and known output value (Khan, 2021). The quality of the expected ML algorithm is compared to the widely known result (Oak, et al., 2019). Training data is used to perform this process over and over again until the highest possible prediction accuracy is reached or the maximum number of loops is reached (Prasad, et al., 2013).

In unsupervised learning methods, the actual value of the class label output is not known beforehand. Early on in the process, faulty systems, such individual units or entire classes, can be identified with the use of defect prediction modelling (Rajeyyagari, et al., 2022). This can be done by labelling the modules as either reliable or prone to errors. Different methods, including support vector classifiers (SVC), random forests, naive Bayes, decision trees (DT), and neural networks, are used to select the classification module (NN). The modules most prone to defects are given higher priority throughout the progress testing phases, whereas the modules least prone to defects are examined as resources permit (Sandeep, et al., 2022).

The classifier approach establishes and examines the feature of classification known as the relationship between the attributes and the training dataset class label through formulas for categorizing the targets. Future dataset class labels must be defined using those guidelines as well (Vanitha, et al., 2019). As a result, the unclassified datasets can be categorized using a classifier and classification patterns (Prasad & Chakrabarti, 2014). Due to the widespread use of software, defining software problems, locating the defect, and recognizing it requires researchers to perform repetitive tasks. The primary objective of separating the software dataset into a faulty and non-defective dataset is to use it as a model for bug prediction.

Complete Chapter List

Search this Book:
Reset