Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers

Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers

Fei-Long Chen, Feng-Chia Li
DOI: 10.4018/978-1-4666-0158-1.ch012
(Individual Chapters)
No Current Special Offers


Credit scoring is an important topic for businesses and socio-economic establishments collecting huge amounts of data, with the intention of making the wrong decision obsolete. In this paper, the authors propose four approaches that combine four well-known classifiers, such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Back-Propagation Network (BPN) and Extreme Learning Machine (ELM). These classifiers are used to find a suitable hybrid classifier combination featuring selection that retains sufficient information for classification purposes. In this regard, different credit scoring combinations are constructed by selecting features with four approaches and classifiers than would otherwise be chosen. Two credit data sets from the University of California, Irvine (UCI), are chosen to evaluate the accuracy of the various hybrid features selection models. In this paper, the procedures that are part of the proposed approaches are described and then evaluated for their performances.
Chapter Preview


Credit scoring has been regarded as a critical topic, with its related departments striving to collect huge amounts of data to avoid making the wrong decision. Consumer credit prediction is a very important issue. Credit scoring models are developed to distinguish which customers belong to good or bad groups based on their related attributes, such as age, marriage status, and income, or on their past records. Credit scoring can be regarded as the binary classification problem of classifying an observation into pre-defined groups. Previous studies focused on increasing the accuracy rate of credit scoring models since even a little bit of improvement will result in significant cost savings. Modern data mining techniques have been adopted to build the credit scoring models (Huang, Chen, & Wang, 2007). Researchers have developed a variety of approaches, including linear discriminate approach (Bellotti & Crook, 2009; Lee & Chen, 2005; Thomas, 2000), decision tree approach (Huang & Wang, 2006), Rough sets theory approach (Caballero, Alvarez, Bel, & Garcia, 2007), F-score approach (Chen & Lin, 2005), Case-Based Reasoning (Osman, Taha, & Dhavalkumar, 2009), Association Analysis (Hashemi, Ray, & Le Blanc, 2009) and genetic programming approach (Ong, Huang, & Tzeng, 2005). Most credit scoring models have been widely developed to improve their accuracy in the past few years. Classic evaluation measures, such as accuracy, information, distance, and dependence, were used for removing irrelevant features. However, artificial intelligence and machine learning techniques have been used to solve some decision-making problems (Moisan & Sabine, 2010). Dash and Liu (1997) provided a detailed survey and overview of the existing methods for features selection and suggested a features selection process. Comparing the conclusions of previous studies, the results are often contradictory (Baesens et al., 2003). Recently, researchers have proposed the hybrid data mining approach in the design of an effective credit scoring model. For example, Lee et al. (2002) integrated neural network with traditional discriminate analysis approach. Chou et al. (2006) applied machine learning techniques such as Back-Propagation Network (BPN), Decision Tree (DT), and Support Vector Machine (SVM) to solve credit scoring problems. According to previous studies, machine learning techniques are superior to traditional approaches in dealing with credit scoring problems, especially in nonlinear pattern classification (Wu, Huang, & Meng, 2008; Yu & Liu, 2004). For conventional statistical classification, an underlying probability model should be assumed. The more recently developed data mining techniques can perform the classification task without this limitation and achieve better performance than traditional statistical approaches (Huang et al., 2007). Features selection can be categorized as the filter and the wrapper approaches (Liu, 1998). The former approach selects important features and separates features from a classifier that is independent of any learning algorithm. It relies on various measures of the general characteristics of the training data, such as distance, information, dependency, and consistency. The wrapper approach usually uses the predictive accuracy of a pre-determined learning algorithm to determine the accomplishment of the selected subsets. Generally, filters are faster and can be used as a preprocessing step to reduce space dimensionality and over-fitting. On the other hand, the wrapper approach may perform better in finding useful subsets of relevant variables (Guyon & Elisseeff, 2003). However, the problem is known to be NP-hard (Amaldi & Kann, 1998) and the search becomes quickly computationally intractable. A large number of features are computationally expensive (John, Kohavi, & Pfleger., 1994). In this research, four classifiers are combined with four features selection approaches to perform a better classification. Also, parameters tuning is necessary before designing the hybrid features selection models. There are different parameters in the classifiers which need to be arranged to show the highest accuracy rate of credit scoring data sets.

Complete Chapter List

Search this Book: