These are statistical methods based on drawing new samples from an original sample of data in order to reconstruct the distribution of the initial population where the sample came from. They are used for various procedures, for example for computing confidence intervals and for making statistical tests. Common resampling techniques include bootstrap, jackknife and permutation tests.
Published in Chapter:
A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation
Lefteris Angelis (Aristotle University of Thessaloniki, Greece), Nikolaos Mittas (Aristotle University of Thessaloniki, Greece), and Panagiota Chatzipetrou (Aristotle University of Thessaloniki, Greece)
Copyright: © 2015
|Pages: 27
DOI: 10.4018/978-1-4666-6359-6.ch003
Abstract
Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.