Statistical Learning Methods for Classification and Prediction of Groundwater Quality Using a Small Data Record

Statistical Learning Methods for Classification and Prediction of Groundwater Quality Using a Small Data Record

Mohamad Sakizadeh (Shahid Rajaee Teacher Training University, Tehran, Iran) and Hassan Rahmatinia (Shahid Rajaee Teacher Training University, Tehran, Iran)
DOI: 10.4018/IJAEIS.2017100103
OnDemand PDF Download:
No Current Special Offers


The objective of this study was to consider the efficiency of support vector machine (SVM) and artificial neural network (ANN) for the classification and prediction of groundwater quality using a small data record in Malayer, Iran. For this purpose, 14 groundwater quality variables that had been collected from 27 groundwater sampling wells were used. Cluster analysis discriminated the total sampling stations into two groups. The classification was implemented by SVM using polynomial and RBF kernel methods. The respective sensitivity and specificity of this model were 0.89 and 0.80 while that of positive predictive value and negative predictive value were 0.89 and 0.86, respectively. The prediction of water quality index (WQI) was implemented using ANN. Despite the high correlation coefficient between the predicted and observed values of WQI(r = 0.90), the generalization ability of this model was low(r = 0.60) indicating the over-fitting of the model to the training data set.
Article Preview


Due to water shortage in Iran especially in arid and semi-arid areas of western parts of the country where intensive agricultural activity is prevalent (Jalali, 2006), the over-exploitation of groundwater resources has become a serious environmental problem. The low level of precipitation and the mismanagement of the available surface water resources have made people to use groundwater for drinking besides agricultural purposes in recent years. Therefore, given the fact that no treatment is done on these water resources, the quality of groundwater should be monitored periodically to ensure that there is not any issue with respect to the use of poor quality groundwater. Thus, to ascertain the suitability of groundwater for any purpose, it is essential to classify and forecast the quality of water resources (Subramani et al., 2005).

There are two classification methods in data mining techniques including unsupervised and supervised learning. In unsupervised methods, inference from training data is drawn without labeled responses. The most common unsupervised technique is cluster analysis that is used in exploratory data analysis to find hidden patterns or grouping in data. In this learning method, a measure of similarity which is defined upon metrics such as Euclidean or probabilistic distance is utilized. There are multiple algorithms associated with this technique and it has been widely applied by researchers for unsupervised classification of surface (e.g. Hajigholizadeh and Melesse, 2017) and groundwater (e.g. Cloutier et al., 2008, Devic et al., 2014; Jiang et al., 2015) sampling stations with respect to physicochemical water quality parameters. There are also many other supervised and unsupervised classification methods in the literature however, in water quality researches, artificial neural network (ANN) and support vector machine (SVM) are the most applied ones(e.g. Khalil et al., 2005; Yoon et al., 2011; Modaresi and Araghinejad, 2014).The statistical methods such as neural networks (NN) and support vector machines (SVM) can be used as surrogate for complex mathematical models as they do not require knowledge of the mathematical form of the relationship between inputs and corresponding outputs (Dixon, 2009). In addition, these approximation methods are a viable alternative when working with incomplete information, ill-defined and imprecise relationships among input and corresponding output variables common with groundwater contamination (Dixon, 2009). For this reason, their usage in groundwater quality modeling has increased recently. The SVM is based on the structural risk minimization (SRM) and was first introduced by Vapnik (1995). It is also a relatively new structure in the data-driven prediction field (Yoon et al., 2011). Some of the reasons for the high application of SVMs in recent years as explained by Aryafar et al. (2012) are the high learning ability with a small number of parameters, their robustness against the error of the model, and their computational efficiency compared with several other statistical learning methods. On the contrary, awareness of neural modeling in the environmental field is becoming increasingly evident given the emerging number of published studies and reference works cataloged in the literature (e.g. Wieland et al., 2010; Kisi et al., 2013).

Complete Article List

Search this Journal:
Volume 13: 1 Issue (2022): Forthcoming, Available for Pre-Order
Volume 12: 4 Issues (2021): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 2 Issues (2012)
Volume 2: 2 Issues (2011)
Volume 1: 2 Issues (2010)
View Complete Journal Contents Listing