A Generic Data Mining Model for Software Cost Estimation Based on Novel Input Selection Procedure

A Generic Data Mining Model for Software Cost Estimation Based on Novel Input Selection Procedure

Zahid Hussain Wani (University of Kashmir, Srinagar, India), Kaiser J. Giri (Islamic University of Science & Technology, Awantipora, India) and Rumaan Bashir (Islamic University of Science & Technology, Awantipora, India)
Copyright: © 2019 |Pages: 17
DOI: 10.4018/IJIRR.2019010102
OnDemand PDF Download:
No Current Special Offers


It is always preferable for any estimation model to be inclusive as accuracy in estimation models inherently lie with their inclusiveness. Software cost estimation is the prediction of development effort and time required to develop a software project and being predictive in nature, it demands for inclusiveness, which will accordingly bring the accuracy in it. In this study, a generic model for software cost estimation using an input selection procedure is proposed. The proposed model brings inclusiveness into the already available data mining techniques of software cost estimation by sensitively choosing a subset of highly relevant project attributes and ignoring the less relevant ones. In this article, a diverse set of data mining techniques for software cost estimation are considered. All these techniques are experimented on five data sets before and after passed through the proposed procedure. The obtained results showed that newly generated techniques after being passed through the proposed procedure offer accurate results up in the way of efficiency in software cost estimation.
Article Preview


Software cost estimation is the summation of building effort and calendar time required to develop any software project. The building effort includes the count of working hours and the number of workers in terms of staffing levels required to develop a software project. The software development organizations often face the problems of estimations of effort and development time in software development process. A big reason for this is the elusive character of the “software”, software as a product, (Chaos Report, 2009). Most often in the process of software effort estimation, the effort needed to develop any new software project is estimated by comparing this project based on relevance of its attributes like number of Lines of code, development platform used and the developmental team experience with previous projects and accordingly where the current project’s data fits best, the information of that very existing project is used by management for estimation of current software project. This way of following the estimation process of any software project at an individual level lets every single project manager to effectively evaluate his project progress, gave him a potential cost control, delivery accuracy, and at the management level or more precisely in a broader perspective helps the organization in improving the planning and utilization of personnel, making more accurate tendering bids (Jørgensen & Wallace, 2000) and at final will therefore help in landing the organization in a better schedule of their futuristic projects. So, keeping in view the importance of SCE, a large number of software cost estimation models from past more than 30 years have been introduced but unfortunately none of them suffice the required need. So, in this paper, we have proposed a general framework for building new estimation models from the already existing ones simply by bringing inclusiveness into them. The inclusiveness will be brought by cutting down the less predictive and less important attributes of the project data based on which these models are experimented and evaluated. After getting results, it becomes clear that inclusiveness proportionally increases the accuracy in software cost estimation of these models.

The rest of the paper is organized as: In section 2, an overview of the existing literature is presented. In Section 3, different techniques already involved in software cost estimation will be discussed. Section 4 reflects upon the data sets based on which the techniques already discussed in section 3 will be assessed and evaluated. In Section 5, proposed generic backward input selection schema for identifying the highly relevant attributes will be given and later all the existing techniques of section 3 will once again be driven on the initial data sets given in section 4. Section 6 reports the evaluation criteria, results of SCE generated from the newly proposed data mining techniques and finally the analysis of these newly generated results against the existing results already got in section 4. The same will be followed by conclusion to be given in Section 7.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2021): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing