Machine Learning and Exploratory Data Analysis in Cross-Sell Insurance

Anand Jha, Brajkishore Prajapati

Source Title: Encyclopedia of Data Science and Machine Learning

DOI: 10.4018/978-1-7998-9220-5.ch039

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data is playing a central role in the insurance industry. The current journey of insurance industry is conquered by data collection to make future decisions since this is the digital era of the insurance industry in its journey of 700+ years. This chapter focuses on exploratory data analysis (EDA) to identify significant and critical factors to develop business strategy as well as to predict customers' responses in cross-sell health insurance. Response is either acceptance or rejection of a health insurance product offered to existing customers, who may or may not hold policies with the company. Exploratory data analysis (EDA) presents data analysis and visualization from various lookouts to characterize data that can help the insurer in strategic decision making.

Chapter Preview

Top

Introduction

A financial shield against a risk or threat is insurance. A business corporation that sells such financial shield and an entity that buys such financial shield are called insurer and insured respectively. A written agreement between insurer and the insured defines this financial shield and called policy document or insurance policy. But this financial shield is given to insured in lieu of monetary payment called premium (Aswani et al., 2020).

The insurance policies can be of several kinds according to the different kinds of risks. Some popular types of insurance are Life Insurance, Health Insurance, Vehicle Insurance, Travel Insurance and Property Insurance (Bacry et al., 2020). This basic idea of risk or monetary risk reduction is the reason to run entire industry (Batra et al., 2021).

Insurance industry access to volume of data generated with the advent of technologies has tremendously increased (Accenture, 2018). It can be said that more data have been created, collected in the past couple of years than the human society has ever produced due to irresistible explosion in data from a host of sources, like telematics, Internet usage, social media bustle, voice analytics, connected sensors and wearable devices.

Big data can be unstructured, semi-structured and structured (Chakraborty & Kar, 2017; Kar, 2016). It is well known fact that actuarial formulas are still in use by several insurance businesses, but now data science and analytics can be utilized to excerpt hidden information that can help for improved strategic as well as administrative decision making than the conventional techniques. (Chowdhury et al., 2020; Karhade et al., 2019). Use of data technologies can have a serious role in different facets of the insurance business like risk assessment, claim analysis, underwriting analysis, customer profiling and fraud detection etc. (Das et al., 2020; Das et al., 2021).

Together with various data analysis and visualization techniques, Machine Learning (ML) as a sub-domain of artificial intelligence (AI) come as a rescue to Insurance Industry to facilitate such datasets. Machine learning teaches computers to think in a similar way as humans, thus learning and improving upon past experiences. Practically almost all tasks that can be accomplished with a data-defined pattern or set of rules can be automated with machine learning (Accenture, 2018). Machine learning (ML) techniques can be effectively used across Structured, Semi Structured or Unstructured datasets (Burri et al., 2019), (Chakraborty & Kar, 2017; Kar, 2016). But most insurers are struggling to maximise the benefits of machine learning and thus not able to unearth analytical insights hidden into datasets. Since machine learning is in use during last few decades, it is not a novel technology. Supervised learning, Unsupervised Learning and Reinforcement Learning are three core classes of machine learning. Majority of insurers are working with supervised learning for risk assessment by means of identified parameters to obtain preferred outcome. But during last few decades, unsupervised learning is also gaining popularity among present age insurers.

Key Terms in this Chapter

Supervised Learning: When the machine learns under supervision, it is called supervised learning. It uses a labelled dataset. Labelled dataset means that it contains the answer or solution to each problem dataset. For example, a labelled animal dataset may contain images with labels like elephant, cat, etc. Machine learning model, trained with the labelled dataset can predict the animal whenever a new animal image fed to the model by comparing that image with the labelled dataset.

Chi-Square Test: Chi-square statistic is a kind of statistical filter method that can be used to assess the correlation between different features using their frequency distribution. It is a number that tells us, how much difference exists between observed counts and expected counts (in case of no relationship at all in the population).

Feature Engineering: Feature engineering means creating new features exploiting the knowledge gained during exploratory data analysis as well as domain knowledge of dataset and application of encoding techniques like stage, label, dummy (one hot), frequency, target encoding.

Critical Value: P-value is the probability value that measures the chance of getting results at least as extreme as the results actually observed during the test, if the null hypothesis is correct. Since the p-value is just a value, we need to compare it with the critical value (?): P-value > ? (Critical value): Fail to reject the null hypothesis of the statistical test.P-value £ ? (Critical value): Reject the null hypothesis of the statistical test.

Univariate Analysis: The objective of univariate analysis is to derive the data, summarize it and discover pattern(s) present in it. Uni means one and variate means variable, so univariate analysis involves single variable, i.e., it does not find relationships between variables.

Bivariate Analysis: Bi means two and variate means variable. Bivariate analysis discovers relationships between two variables/features. Bivariate analysis can be used to find the relationship between a feature in dataset and target feature (variable).

Machine Learning: Machine Learning makes computers to learn something. It uses massive data and learning algorithms. The algorithms train themselves using this data in order to learn without being specifically programmed.

Exploratory Data Analysis (EDA): Preliminary analysis of dataset in order to find important measures, metrics, features and relationship between measures so that we can gain an insight into trends, patterns, detect outliers in the dataset.

Reinforcement Learning: Reinforcement learning established on interaction with the environment. In this type of learning, machine learns to react to an environment on their own. Reinforcement learning is useful in the field of Robotics, Gaming, etc.

Z-Test: Z-Test is statistical way of testing a hypothesis. Z-test is used when sample size is large, n 3 30.

Unsupervised Learning: In unsupervised learning, machine learns itself without any supervision. No labelled dataset is available in unsupervised learning. Unsupervised learning is a kind of self-organized learning. Objective of unsupervised learning to discover the underlying patterns.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Machine Learning and Exploratory Data Analysis in Cross-Sell Insurance

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List