Machine Learning Classification to Effort Estimation for Embedded Software Development Projects

Machine Learning Classification to Effort Estimation for Embedded Software Development Projects

Kazunori Iwata, Toyoshiro Nakashima, Yoshiyuki Anan, Naohiro Ishii
DOI: 10.4018/978-1-6684-3702-5.ch078
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This paper discusses the effect of classification in estimating the amount of effort (in man-days) associated with code development. Estimating the effort requirements for new software projects is especially important. As outliers are harmful to the estimation, they are excluded from many estimation models. However, such outliers can be identified in practice once the projects are completed, and so they should not be excluded during the creation of models and when estimating the required effort. This paper presents classifications for embedded software development projects using an artificial neural network (ANN) and a support vector machine. After defining the classifications, effort estimation models are created for each class using linear regression, an ANN, and a form of support vector regression. Evaluation experiments are carried out to compare the estimation accuracy of the model both with and without the classifications using 10-fold cross-validation. In addition, the Games-Howell test with one-way analysis of variance is performed to consider statistically significant evidence.
Chapter Preview
Top

Support Vector Regression

SVR uses the same principles as SVM for classification, albeit with a few minor differences. The 𝜀-SVR (Alex & Bernhard, 2004) regression method uses an 𝜀-insensitive loss function to solve regression problems. This approach attempts to find a continuous function in which as many data points as possible lie within the 𝜖-wide insensitivity tube. 𝜀-SVR is used to estimate the amount of effort required for software projects (Oliveira, 2006). This approach has been tested using the well-known NASA software project dataset (John & Victor, 1981; Shin & Goel, 2000). However, these studies did not investigate the parameters of 𝜀-SVR. The effectiveness of the SVM (and SVR) using the resulting continuous function depends on the kernel parameter (𝛾) and soft margin parameter (C) (Cortes & Vapnik, 1995). In addition, the value of 𝜖 affects the estimations given by 𝜀-SVR.

We proposed a three-dimensional grid search to find the most appropriate combination of these parameters (Iwata, Liebman, Stone, Nakashima, Anan & Ishii, 2015). Our method improved the mean magnitude of relative error (MMRE, see Equation (3) in the section “Evaluation Criteria”) from 0.165 (Cortes & Vapnik, 1995) to 0.149 using leave-one-out cross-validation (Shin & Goel, 2000).

Complete Chapter List

Search this Book:
Reset