Auto Associative Extreme Learning Machine Based Hybrids for Data Imputation

Auto Associative Extreme Learning Machine Based Hybrids for Data Imputation

Chandan Gautam (Institute for Development and Research in Banking Technology, India) and Vadlamani Ravi (Institute for Development and Research in Banking Technology, India)
DOI: 10.4018/978-1-5225-0997-4.ch005
OnDemand PDF Download:
List Price: $37.50


This chapter presents three novel hybrid techniques for data imputation viz., (1) Auto-associative Extreme Learning Machine (AAELM) with Principal Component Analysis (PCA) (PCA-AAELM), (2) Gray system theory (GST) + AAELM with PCA (Gray+PCA-AAELM), (3) AAELM with Evolving Clustering Method (ECM) (ECM-AAELM). Our prime concern is to remove the randomness in AAELM caused by the random weights with the help of ECM and PCA. This chapter also proposes local learning by invoking ECM as a preprocessor for AAELM. The proposed methods are tested on several regression, classification and bank datasets using 10 fold cross validation. The results, in terms of Mean Absolute Percentage Error (MAPE,) are compared with that of K-Means+Multilayer perceptron (MLP) imputation (Ankaiah & Ravi, 2011), K-Medoids+MLP, K-Means+GRNN, K-Medoids+GRNN (Nishanth & Ravi, 2013) PSO_Covariance imputation (Krishna & Ravi, 2013) and ECM-Imputation (Gautam & Ravi, 2014). It is concluded that the proposed methods achieved better imputation in most of the datasets as evidenced by the Wilcoxon signed rank test.
Chapter Preview


Missing data can be observed in many datasets, which have been collected in real time. It can occur due to many reasons like sometimes people don’t answers all query during surveydue to privacy or sometimes data entry operator leave blank space due to lack of concentration or some other reasons etc. Failure of any system or snsor nodes in wireless sensor network can also lead to missing data. Missing data is a very challenging issue in the field of analytics because the completeness and quality of the data always plays a crucial role in analyzing the available data. Replace the missing value by an appropriate value is called imputation. In general, data mining algorithms are not capable of handling data incompleteness on its own. So, it is necessary to impute those missing value by some appropriate vaue using some suitable data imputation algorithm (Ankaiah & Ravi, 2011; Abdella & Marwala, 2005; García & Kalenatic, 2011; Nishanth, Ravi, Ankaiah & Bose, 2012).

Kline (1988) proposed following procedure to handle missing data:

  • 1.

    Deletion procedure viz., Listwise deletion and Pairwise deletion (Song & Shepperd, 2007),

  • 2.

    Imputation procedure (Schafer, 1997),

  • 3.

    Model based procedure, and

  • 4.

    Machine learning methods.

The remainder of this chapter is organized as follows: first, a brief review of literature on imputation of missing data is presented. Further, proposed method is explained. Then, description of the dataset and Experimental design is described in next section. Results and discussions are presented in second last section and last section states about conclusion.



In case of numerical attributes, missing data can be handled in various ways. Numerous type of imputation is possible like: machine learning (ML) based, deletion of missing values, model based approaches etc. There are various ML based approaches like auto-associative neural network imputation with genetic algorithms (Abdella & Marwala, 2005), SOM (Merlin, Sorjamaa, Maillet & Lendasse, 2010), multi-layer perceptron (Gupta & Lam, 1996), K-Nearest Neighbor (Batista & Monard, 2002), fuzzy-neural network (Gabrys, 2002) etc. Batista and Monard (2002, 2003) and Jerez, Molina, Subirates and Franco (2006) employed K-nearest neighbour (K-NN) for handling missing data. Mutual K-NN method proposed by Liu and Zhang (2012) to classify noisy and incomplete data. For handling missing data, Samad and Harp (1992) employed SOM based approach, Austin and Escobar (2005) employed Monte Carlo simulations. Several studies employed Multi-layer perceptron (MLP) for imputation, we train MLP using data without missing attribute as autoassociative model and furthet pass data with missing attribute to trained model for imputation Sharpe and Solly (1995), Nordbotten (1996), Gupta and Lam (1996), Yoon and Lee (1999), Silva-Ramírez, Pino-Mejías, López-Coello and Cubiles-de-la-Vega (2011) and Nkuna and Odiyo (2011). The authors used MLP for data imputation. Auto-associative neural network (AANN) has also been employed for this task by keeping input and output variable identical (Marseguerra & Zoia, 2002; Marwala & Chakraverty, 2006). Ragel and Cremilleux (1999) employed Robust Association Rules Algorithm (RAR) to address multiple missing values in database. Chen, Huang, F. Tian and S. Tian (2008) proposed selective Bayes classifier to handle missing data. Fuzzy c-means algorithm has been employed by Nouvo (2011) to handle incomplete data. Principles of chaos theory has been employed by Elshorbagy, Simonovic and Panu (2002) to handle missing data in stream flow data. Expectation maximization (EM) algorithm has been employed by Dempster, Laird and Rubin (1977) to handle missing values in multivariate data. García and Kalenatic (2011) also proposed Genetic algorithm (GA) based approach to handle missing attribute in multivariate data. Ankaiah and Ravi (2011) handled missing data using hybrid method in two stages. In first stage, K-means has been employed and in second stage, MLP has been employed.

Key Terms in this Chapter

Evolving Clustering Method (ECM): A one-pass, fast clustering method based on normalized Euclidean distances. It can be applied in two modes: on-line and off-line mode. It yields results in just one pass only. It processes one data only one time, there is no iteration require to process the one time processed data gain and again.

Gray System Theory (GST): A method of Gray System Theory (GST) which measures the degree of similarity between two systems. Two things are needed to be calculated for GRA: Gray Relational Coefficient (GRC), Gray Relational Grade (GRG). A larger value of GRG indicates two systems or elements are more similar and smaller value indicates less similarity of the systems or elements.

Autoassociative Neural Network: Also called as autoencoder. An autoencoder has been generally used to learn representation from a dataset as well as for dimensionality reduction. In Autoassociative neural network, output is identical to input i.e. trying to reconstruct input at output layer.

Principal Component Analysis (PCA): A very popular dimensionality reduction technique. It converts correlated variable into linearly uncorrelated variable, which will be orthogonal to each other. Each principal component is a linear combination of the original variables i.e. correlated variables. So, it is not feature selection technique but dimensionality reduction technique.

Complete Chapter List

Search this Book: