Credit Scoring in the Age of Big Data

Credit Scoring in the Age of Big Data

Billie Anderson (Bryant University, USA) and J. Michael Hardin (University of Alabama, USA)
Copyright: © 2014 |Pages: 9
DOI: 10.4018/978-1-4666-5202-6.ch049

Chapter Preview



Credit scoring is a method of modeling potential risk of credit applicants. It involves using different statistical techniques and past historical data to create a credit score that financial institutions use to assess credit applicants in terms of risk. Credit scoring is essentially a type of classification problem: which credit applicants should be considered good risks and which applicants should be considered bad risks.

A scorecard model is built from a number of characteristic inputs. Each characteristic is comprised of a number of attributes. In the example scorecard shown in Figure 1, age is a characteristic and “25–33” is an attribute. Each attribute is associated with a number of scorecard points. These scorecard points are statistically assigned to differentiate risk, based on the predictive power of the variables, correlation between the variables, and business considerations.

Figure 1.

Example scorecard


For example, in Figure 1, the credit application of a 32 year old person, who owns his own home and makes $30,000, would be accepted for credit by this institution. The total score of an applicant is the sum of the scores for each attribute present in the scorecard. Lower scores imply a higher risk of default, and higher scores imply lower risk.

Within the past few years, the development of an accurate credit scoring model has become a priority for several reasons: growth in competition among credit card companies, a rising number of bad loans as a result of a weak United States economy, and the need for more stringent government regulations. The recent Congressional Dodd-Frank Act added an additional 250 regulations that will involve 11 governmental bodies. These challenges have prompted the financial industry to explore and use non-traditional data sources as part of the loan granting decision.

For years the financial industry has managed large volumes of data generated by customer, operational, and regulatory sources. Thus, banks are thoroughly familiar with big data – massive amounts of unstructured data. Financial service lenders are the most data intensive economic sector (Rubin, 2011). There is a shift taking place as to how individuals interact with their bank. Many customers are turning to digital channels to conduct transactions rather than use the traditional face-to-face branch relationship. Banks must now respond and use the data collected from digital sources to make real-time loan offers with the highest acceptance rate possible. Banks are learning how to use big data sources to monitor changes in customer behavior and to improve the banking experience instantaneously. More data is now available than ever before; the challenge for financial institutions will be how to put that data to work and make the smartest lending decisions.

This chapter will describe how banks and financial organizations are starting to incorporate big data sources, such as data from social media websites, into the credit lending process. A discussion of how more established organizations, such as Experian and SAS, are incorporating big data in their scorecard methodology will be given. A description of two start-up companies which are using electronic and big data sources such as social media exclusively to provide banking services and grant loans will be discussed.



The statistical methods used to categorize objects into groups can be traced to 1936 in Fisher’s publication (Fisher, 1936). Durand (1941) was the first to use Fisher’s methodology to distinguish between good and bad loans. Using this research, the founders of credit scoring, Bill Fair and Earl Isaac, built the first credit scoring system for the United States in 1958. Although credit scoring has been in use since that time, it is only recently that credit scoring has become widespread.

Abdou & Pointon (2011) provide the most contemporary and comprehensive review of the credit scoring literature. The authors surveyed 214 articles, books and manuscripts related to credit scoring applications in the business field. The authors conclude that different types of credit scoring models are applicable under different situations. There are some older, yet still very relevant credit scoring resources that discuss the statistical issues in developing a credit scorecard (Siddiqi, 2006; Thomas, Oliver, & Hand, 2005; Hand & Henley, 1997).

Key Terms in this Chapter

Klout Score: A klout score is a numerical measure on the scale of 1 to 100 that measures how influential an individual is in an online social network setting. The higher the Klout score the more influential the individual is considered. For example, is a person influential in getting others to buy a product or sign up for a service.

Social media: A Web-based social outlet. Social media may be in the form of online websites that allow individuals to communicate and establish social networks such as Facebook and Web-based videos like those viewed on YouTube.

Social Media Credit Score: A measure of social risk that is created using information from social media outlets.

Near Field Communication: Near field communication allows wireless communication and data exchange between digital devices like smartphones. The technology utilizes electromagnetic radio fields to allow a mobile phone to collect data from another device at close range. In many ways it is like a contactless payment card that is integrated into a phone. It is similar to Bluetooth or Wi-Fi technology, except that instead of programming two devices to work together, they can simply touch to establish a connection.

Unstructured Data Source: Data that has no defined identification such as e-mails, images, text and Web logs, and videos. Unstructured data is data that is not contained in a database.

Hadoop: Open source software that stores and analyzes massive unstructured data sets.

Behavioral Game Technology: Technology such as Facebook games that can be used to empirically determine how individuals make decisions under uncertainty and strategic interaction in a social media setting.

Tweet: A posting on the social media site Twitter that consists of 140 characters or less.

Structured Data Source: Data that contains an identifiable structure. Structured data is typically stored in traditional databases with identifiable rows and columns.

Credit Score: A measure of risk associated with how likely an individual is to re-pay a financial loan.

Complete Chapter List

Search this Book: