Predicting the Success of Ensemble Algorithms in the Banking Sector

Predicting the Success of Ensemble Algorithms in the Banking Sector

Özge Hüsniye Namlı Dağ (Department of Industrial Engineering, Turkish - German University, Istanbul, Turkey)
Copyright: © 2019 |Pages: 20
DOI: 10.4018/IJBAN.2019100102

Abstract

The banking sector, like other service sector, improves in accordance with the customer's needs. Therefore, to know the needs of customers and to predict customer behaviors are very important for competition in the banking sector. Data mining uncovers relationships and hidden patterns in large data sets. Classification algorithms, one of the applications of data mining, is used very effectively in decision making. In this study, the c4.5 algorithm, a decision trees algorithm widely used in classification problems, is used in an integrated way with the ensemble machine learning methods in order to increase the efficiency of the algorithms. Data obtained via direct marketing campaigns from Portugal Banks was used to classify whether customers have term deposit accounts or not. Artificial Neural Networks and Support Vector Machines as Traditional Artificial Intelligence Methods and Bagging-C4.5 and Boosted-C.45 as ensemble-decision tree hybrid methods were used in classification. Bagging-C4.5 as ensemble-decision tree algorithm achieved more powerful classification success than other used algorithms. The ensemble-decision tree hybrid methods give better results than artificial neural networks and support vector machines as traditional artificial intelligence methods for this study.
Article Preview
Top

Introduction

Banking which was founded in the middle ages have taken their present place by showing significant developments. Banking sector which grows in line with customer needs is important for people and institutions. The banking sector is also a sector where competition is very high. When all these things are thought, previously knowing customer’s needs and predicting customer’s behaviors are also important issues for the competition market in the banking sector as well as in other sectors. Therefore, firms need strong predictions to increase their competitiveness. At this point, data mining techniques and algorithms are very effective tools for solving complex business problems (Rahman, 2018). Data mining methods and techniques are used in many different fields from the health sector to the retail sector. Data mining can be defined as a process for the discovery of patterns and relationships along complex data in a database to build predictive models (Kincade, 1998; Safdari, Rezaei-Hachesu, GhaziSaeedi, Samad-Soltani, & Zolnoori, 2018). One of the methods frequently used in data mining is classification. In the classification, the learned model on the training data is then applied to predict the classes of the test data for which the classes are not specified.

In this study, the data obtained from a direct marketing campaign by a Portuguese banking institution is used. The data is classified as to whether customers are subscribed to the term deposit account or not. In classification, Artificial Neural Networks (ANN) and Support Vector Machine (SVM) algorithms as traditional artificial intelligence algorithms, bagging and boosted algorithms as ensemble algorithms is used. The dataset is eliminated from outliers and extreme values. On the other hand, ReliefF algorithm, correlation-based feature selection algorithm and Chi-Squared attribute evaluation are used and the features to be used in application are selected. Afterwards, the data set is divided into five different training and test sets: 10-fold cross validation, 5-fold cross validation, 2-fold cross validation, 80% split, 70% split. The aforementioned algorithms are applied each training and test data set for classification. An overall framework for forecasting in this study is illustrated in Figure 1.

The aim of this study is to compare the prediction performance of the bagging and boosted ensemble machine algorithms, which use C4.5 as the sub classifier, with ANN and SVM algorithms as traditional artificial intelligence algorithms that are widely used and give good results. When making this comparison, feature selection is applied to the data set to increase the prediction success of the algorithms.

The remainder of the paper is organized as follows: in the next section, the literature of the classification method which is one of the methods of data mining is investigated. In the third section, the content of the data set is mentioned and the problem is defined. After description of the problem, in the fourth section, the methodology of the classification algorithms used in the study is explained. In the fifth section, the application is carried out for forecasting. Customers decisions about the term deposit account is tried to forecast using algorithms proposed on each training and test data set that were separated in different ways will present the results. In the sixth section, the ensemble algorithms used are compared with traditional artificial intelligence algorithms in terms of successes of forecasting. To compare the classification results, performance measures such as percentage of correct classification, kappa statistics, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Root Relative Squared Error (RRSE) and area under Receiver Operating Characteristic (ROC) curve is used. Performance measures show that good classification performance is achieved with the Bagging-C4.5 algorithm. The Ensemble methods is shown to perform better than artificial neural networks and support vector machines algorithms which are traditional artificial intelligence methods. The last section will conclude.

Figure 1.

Research framework

IJBAN.2019100102.f01
Top

Literature Review

Many studies can be found for ensemble methods in the literature. These studies can be summarized as follows:

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 6: 4 Issues (2019)
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 4 Issues (2015)
Volume 1: 4 Issues (2014)
View Complete Journal Contents Listing