Article Preview
TopLiterature Review
In this section, we will provide a brief literature survey of missing values imputation methods with merits and demerits.
Amiri and Jensen (2016) have proposed three missing value imputation methods based on fuzzy-rough sets; namely, implicator/t-norm based fuzzy-rough sets, vaguely quantified rough sets and also ordered weighted average based rough sets which combined with the nearest neighbor algorithm to get benefit from both the simplicity and accuracy of nearest neighbor prediction with the robustnesss and noise tolerance of fuzzy-rough sets. The three algorithms are Fuzzy-Rough Nearest Neighbor Imputation algorithm (FRNNI), Ordered Weighted Average-based nearest neighbor Imputation algorithms (OWANNI) and Vaguely Quantified Nearest Neighbor Imputation (VQNNI). All algorithms compared with each other and found that FRNNI performs better than the other two methods and 11 other existing methods – Bayesian PCA(BPCAI), Concept Most Common (CMCI), Fuzzy K Means(FKMI),K Means(KMI),KNN impute(KNNI), LLS Impute(LLSI), Most Common(MCI),SVD impute(SVDI), SVM impute(SVMI), WKNN impute(WKNNI)and finally Expectation Maximization(EMI) on 27 benchmark datasets.
Deb and Liew (2016) have proposed an algorithm which is used to find the missing value in the traffic accident databases of numerical or categorical values. For estimating, this algorithm has considered four publicly available traffic accident databases from the United States. The first data set is (explore.data.gov) an Largest open federal database, the second is (data.opencolorado.org) National Incident Based Reporting System (NIBRS) of the city and county of Denver, third is (MotorVehicleCrashes-caseinfor-mation:2011 and fourth is MotorVehicleCrashes-individualinformation:2011, data.ny.gov from New York open data portal. The proposed algorithm used the decision tree to find the set of interrelated records and this sampling based missing value imputation algorithm is named as DSMI. The large data set horizontally divides based on non –missing attributes of the record, followed by the missing values are imputed by the link between the missing and non-missing attributes using the IS measure and direct and transitive relationship of attribute value across two records using weighed similarity measures. The proposed algorithm has better accuracy than the existing algorithm where a large number of attributes are categorical in the datasets.