A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

Jian-Rong Yao (ZheJiang University of Finance & Economics, Hangzhou, China) and Jia-Rui Chen (ZheJiang University of Finance & Economics, Hangzhou, China)
Copyright: © 2019 |Pages: 12
DOI: 10.4018/JITR.2019010106

Abstract

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.
Article Preview

Introduction

Credit scoring also names credit assessment, which aims to judge consumer’s credit when they apply a loan. Credit scoring is the cornerstone of financial industry, because accurate detection of customer’s credit can increase returns and avoid losses for the financial institutes (Shi, Zhang, & Qiu, 2013). A lot of financial risks occurred due to weak credit risk management. With the popular of Internet finance, credit scoring has attracted significant attention (Wang, Qi, Fu, & Liu, 2016).

In the prior time, normally, some statistics methods could be considered to build credit scoring models, such as Liner Discriminate Analysis (LDA) (Fisher, 1936), Logistic Regression (LR)(Wiginton, 1980), etc. Steenackers and Goovaerts (1989) used logistic regression model to build a numerical scoring system for personal loans. However, these methods often have to meet the vigorous assumptions which violate reality normally (Wang, Hao, Ma, & Jiang, 2011). At the age of big data, financial institutes have abandoned some traditional credit scoring methods, and turn to new methods or algorithms gradually. Nowadays, more advanced methods have been proved to possess better performance than those old models, such as support vector machines (SVM) (Cortes & Vapnik, 1995), neural network (NN) (Desai, Crook, & Overstreet, 1996), random forest (RF) (Leo, 2001), K-nearest neighbors (KNN) (Zhang & Zhou, 2007) and so on, which are called Machine learning.

Many evidences have shown that the most accurate method is probably support vector machines (SVM) (Crook, Edelman, & Thomas, 2007). A lot of improvement work for SVM has been done by many researchers. Bai-Heng, Zhu, and Jie (2016) combined a Synthetic Minority Over-sampling Technique with SVM to handle imbalance dataset. Shi and Xu (2016) adopted fuzzy membership to make different contribution of each input point to the learning of SVM for credit scoring. Several types of ensemble learner have been studied by Nanni and Lumini (2009) for credit risk analysis. Wang et al. (2011) compared three popular ensemble methods Bagging, Boosting, and Stacking, which based on LRA, DT, ANN and SVM. Ensemble learning method has been seen as a vigorous machine learning paradigm applied in reality in many research.

Feature selection has been often used in data mining problems, especially with the huge data volume. So many features or variables not only increase the computational difficulty, but also reduce the accuracy of final results. Feature subset selection algorithms can be classified into two categories: the filter approach and the wrapper approach. The filter approach just likes a filter to select the most important variables before classifying or other tasks. The wrapper approach must work with the pre-determined learning method to select the optimal subset. Feature reduction based on rough set can make sure that the selected variables keep similar correlation as original dataset (Liu, Jiang, & Yang, 2010). Four different approaches, LDA, Decision tree, Rough sets and F-score, are used as features pre-processing step to optimize feature space in Chen and Li (2010) research. They found that these feature selection methods combine SVM to build credit scoring model is mostly robust and effective in finding optimal subsets and is a promising method to the fields of data mining. A hybrid GA-SVM strategy proposed by Huang, Chen, and Wang (2007) can simultaneously perform feature selection task and model parameters optimization and they claimed that SVM is a promising addition to the existing data mining methods.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 13: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 12: 4 Issues (2019): 3 Released, 1 Forthcoming
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing