An Insight on the Class Imbalance Problem and Its Solutions in Big Data

An Insight on the Class Imbalance Problem and Its Solutions in Big Data

Khyati Ahlawat (University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, India), Anuradha Chug (University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, India) and Amit Prakash Singh (University School of Information, Communication and Technology, Guru Gobind Singh Indraprastha University, India)
Copyright: © 2021 |Pages: 11
DOI: 10.4018/978-1-7998-3444-1.ch002
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Expansion of data in the dimensions of volume, variety, or velocity is leading to big data. Learning from this big data is challenging and beyond capacity of conventional machine learning methods and techniques. Generally, big data getting generated from real-time scenarios is imbalance in nature with uneven distribution of classes. This imparts additional complexity in learning from big data since the class that is underrepresented is more influential and its correct classification becomes critical than that of overrepresented class. This chapter addresses the imbalance problem and its solutions in context of big data along with a detailed survey of work done in this area. Subsequently, it also presents an experimental view for solving imbalance classification problem and a comparative analysis between different methodologies afterwards.
Chapter Preview
Top

Background

This section gives an overview about the problem that lies in the scope of this chapter and its probable solutions available. The section also incorporates a baseline study of some considerate work done in the field of problem domain including all types of solutions in the form of literature survey.

As briefly discussed in the introduction section, the problem of imbalance classification in any dataset exists where there is an occurrence of uneven class distribution. This problem can persist in binary classification as well as multi class classification. The importance to understand this problem lies in the fact that the class which is present in minority is of main concern. So, its correct classification becomes more important than classification of majority class instances. Solutions available to handle this problem lie in two categories:

  • 1.

    Data Level Solutions

  • 2.

    Algorithmic Solutions

Complete Chapter List

Search this Book:
Reset