Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Optimising Prediction in Overlapping and Non-Overlapping Regions

Sumana B.V., Punithavalli M.

Source Title: International Journal of Natural Computing Research (IJNCR) 9(1)

DOI: 10.4018/IJNCR.2020010104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Researchers working on real world classification data have identified that a combination of class overlap with class imbalance and high dimensional data is a crucial problem and are important factors for degrading performance of the classifier. Hence, it has received significant attention in recent years. Misclassification often occurs in the overlapped region as there is no clear distinction between the class boundaries and the presence of high dimensional data with an imbalanced proportion poses an additional challenge. Only a few studies have ever been attempted to address all these issues simultaneously; therefore; a model is proposed which initially divides the data space into overlapped and non-overlapped regions using a K-means algorithm, then the classifier is allowed to learn from two data space regions separately and finally, the results are combined. The experiment is conducted using the Heart dataset selected from the Keel repository and results prove that the proposed model improves the efficiency of the classifier based on accuracy, kappa, precision, recall, f-measure, FNR, FPR, and time.

Article Preview

Top

Introduction

The real-time data accumulated in the society due to day-to-day activities like credit card transactions, patient’s health record, failure in a manufacturing unit, medical diagnosis, detection of oil spills, text classification etc., are always overlapped and class imbalanced in nature (Sumana, 2016). Usually in an imbalanced dataset the classifier misclassifies minority class instances because they get biased by the majority class instances which are highly represented hence classifier shows degradation performance. It frequently occurs in overlapping region as high dimensional data is the main cause for class overlap. As such class imbalance is not a crucial problem but combination of class imbalance with class overlap including high dimensional data is a crucial problem and is the cause for the degrading performance of the classifier (Sumana, 2016).

The data is said to be imbalanced if classes in the data space are not represented in equal proportion. The class representing with higher number of instances is called majority class and the class representing with fewer number of instances is called minority class. Due to class imbalance nature of the dataset classification task becomes very difficult because the classifier gets biased towards the majority class as it does not get necessary information about the minority class to make an accurate prediction therefore show poor classification rates on minority class, because it treats the instances of the minority class as noise hence due to class imbalance nature there will be degradation in the performance of the classifiers. Therefore, a balanced dataset is necessary for building a good prediction model as most of the classifiers perform well when the number of instances of each class is approximately equal in proportion (Guo, 2016).

When samples from different classes have similar characteristics, they do not form separate clusters and are not linearly separated, instead few samples overlap in the data space known as overlapping samples. Class imbalance is not a crucial problem on itself, but combination of class overlap with class imbalance poses a new challenge and is the cause for the degradation performance of the classifier. Liu (2008) in his work stated that overlapping region contains data from more than one class and misclassification often occurs near the class boundaries where overlapping is present and Aida Ali (2015) suggested that high dimensionality with redundant or irrelevant features makes the classifier difficult to recognize the class boundaries and hence is one of the causes for class overlap.

Methods to Address Class Imbalance

Methods to overcome class imbalance can be classified into two categories, data level approach and algorithmic level approach. Data level approach modifies the data and balances it using sampling methods or synthetic data generation methods to overcome classifier getting biased towards majority class whereas in algorithmic level approach the classifier is modified to overcome the bias towards majority class objects.

Data Level Approach

Sampling methods are further divided into over sampling, under sampling and hybrid methods. Under sampling methods balances the proportion of the class distribution by randomly eliminating the samples of majority class retaining the minority class samples. Over sampling methods balances the proportion of the class distribution by randomly replicating the samples of the minority class from the existing samples retaining the majority class samples. Hybrid method is a combination of both over sampling and under sampling methods which balances the proportion of the class distribution by randomly eliminating the majority class samples and replicating the minority class samples.

The synthetic data generation method artificially generates data using bootstrapping or Knn to balance the class distribution example ROSE, ADASYN, SMOTE, MSMOTE, BORDERLINE SMOTE, SMOTE-TL and SMOTE-E, selective pre-processing of imbalanced data (SPIDER) etc.

Complete Article List

Search this Journal:

Reset

Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order

Volume 11: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 10: 4 Issues (2021)

Volume 9: 4 Issues (2020)

Volume 8: 4 Issues (2019)

Volume 7: 4 Issues (2018)

Volume 6: 2 Issues (2017)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference