Missing Value Imputation Using ANN Optimized by Genetic Algorithm

Missing Value Imputation Using ANN Optimized by Genetic Algorithm

Anjana Mishra, Bighnaraj Naik, Suresh Kumar Srichandan
Copyright: © 2018 |Pages: 17
DOI: 10.4018/IJAIE.2018070104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Missing value arises in almost all serious statistical analyses and creates numerous problems in processing data in databases. In real world applications, information may be missing due to instrumental errors, optional fields and non-response to some questions in surveys, data entry errors, etc. Most of the data mining techniques need analysis of complete data without any missing information and this induces researchers to develop efficient methods to handle them. It is one of the most important areas where research is being carried out for a long time in various domains. The objective of this article is to handle missing data, using an evolutionary (genetic) algorithm including some relatively simple methodologies that can often yield reasonable results. The proposed method uses genetic algorithm and multi-layer perceptron (MLP) for accurately predicting missing data with higher accuracy.
Article Preview
Top

Literature Review

In this section, we will provide a brief literature survey of missing values imputation methods with merits and demerits.

Amiri and Jensen (2016) have proposed three missing value imputation methods based on fuzzy-rough sets; namely, implicator/t-norm based fuzzy-rough sets, vaguely quantified rough sets and also ordered weighted average based rough sets which combined with the nearest neighbor algorithm to get benefit from both the simplicity and accuracy of nearest neighbor prediction with the robustnesss and noise tolerance of fuzzy-rough sets. The three algorithms are Fuzzy-Rough Nearest Neighbor Imputation algorithm (FRNNI), Ordered Weighted Average-based nearest neighbor Imputation algorithms (OWANNI) and Vaguely Quantified Nearest Neighbor Imputation (VQNNI). All algorithms compared with each other and found that FRNNI performs better than the other two methods and 11 other existing methods – Bayesian PCA(BPCAI), Concept Most Common (CMCI), Fuzzy K Means(FKMI),K Means(KMI),KNN impute(KNNI), LLS Impute(LLSI), Most Common(MCI),SVD impute(SVDI), SVM impute(SVMI), WKNN impute(WKNNI)and finally Expectation Maximization(EMI) on 27 benchmark datasets.

Deb and Liew (2016) have proposed an algorithm which is used to find the missing value in the traffic accident databases of numerical or categorical values. For estimating, this algorithm has considered four publicly available traffic accident databases from the United States. The first data set is (explore.data.gov) an Largest open federal database, the second is (data.opencolorado.org) National Incident Based Reporting System (NIBRS) of the city and county of Denver, third is (MotorVehicleCrashes-caseinfor-mation:2011 and fourth is MotorVehicleCrashes-individualinformation:2011, data.ny.gov from New York open data portal. The proposed algorithm used the decision tree to find the set of interrelated records and this sampling based missing value imputation algorithm is named as DSMI. The large data set horizontally divides based on non –missing attributes of the record, followed by the missing values are imputed by the link between the missing and non-missing attributes using the IS measure and direct and transitive relationship of attribute value across two records using weighed similarity measures. The proposed algorithm has better accuracy than the existing algorithm where a large number of attributes are categorical in the datasets.

Complete Article List

Search this Journal:
Reset
Volume 10: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 9: 1 Issue (2023)
Volume 8: 1 Issue (2021)
Volume 7: 1 Issue (2020)
Volume 6: 2 Issues (2019)
Volume 5: 2 Issues (2018)
Volume 4: 2 Issues (2017)
Volume 3: 2 Issues (2016)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2012)
View Complete Journal Contents Listing