Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

The Use of Prediction Reliability Estimates on Imbalanced Datasets: A Case Study of Wall Shear Stress in the Human Carotid Artery Bifurcation

Domen Košir, Zoran Bosnic, Igor Kononenko

Source Title: Data Mining: Concepts, Methodologies, Tools, and Applications

DOI: 10.4018/978-1-4666-2455-9.ch035

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Data mining techniques are extensively used on medical data, which is typically composed of many normal examples and few interesting ones. When presented with highly imbalanced data, some standard classifiers tend to ignore the minority class which leads to poor performance. Various solutions have been proposed to counter this problem. Random undersampling, random oversampling, and SMOTE (Synthetic Minority Oversampling Technique) are the most well-known approaches. In recent years several approaches to evaluate the reliability of single predictions have been developed. Most recently a simple and efficient approach, based on the classifier’s class probability estimates was shown to outperform the other reliability estimates. The authors propose to use this reliability estimate to improve the SMOTE algorithm. In this study, they demonstrate the positive effects of using the proposed algorithms on artificial datasets. The authors then apply the developed methodology on the problem of predicting the maximal wall shear stress (MWSS) in the human carotid artery bifurcation. The results indicate that it is feasible to improve the classifier’s performance by balancing the data with their versions of the SMOTE algorithm.

Chapter Preview

Top

Introduction

Increase of the stroke risk is induced by many factors: age, systolic and diastolic hypertension, diabetes, cigarette smoking, high levels of cholesterol, arrhythmia, etc. Changes of the geometrical vessel dimensions in the region of the carotid artery bifurcation certainly affect the blood flow and may lead to stenosis process (Schulz & Rothwell, 2001).

The stenosis is a narrowing of the inner surface (lumen) of the blood vessel. Carotid artery stenosis is usually caused by the cholesterol plaque buildup. The plaque makes the blood flow to become faster and more turbulent. Irregular blood flow can cause pieces of plaque to break off and block smaller arteries in the brain. The pieces of plaque can partially or completely restrict blood flow to parts of the brain which that vessel supplies. The risk of this happening is especially high in patients with arrhythmia.

The common carotid artery supplies the neck, head and brain with oxygenated blood. In the neck it bifurcates into the internal and external carotid artery. The blood flow in this section was simulated using a 3D model in order to analyze the influence of geometric parameters on maximum wall shear stress (MWSS) in the human carotid artery bifurcation (Radović & Filipović, 2010).

We transformed the regression problem of predicting the MWSS value into two classification problems by setting two thresholds for wall shear stress values. We try to predict the levels MWSS using the 3D model’s geometric parameters, but both classification datasets (mwss95 and mwss99) suffer from the class imbalance problem.

Big imbalance in data can cause some classifiers to perform poorly. Imbalanced data is common in real world problems, such as image analysis (Kubat, Holte & Matwin, 1998), fraud detection (Fawcett & Provost, 1996), text classification (Zheng, Wu & Srihari, 2004) and medicine (Mac Namee, Cunningham, Byrne & Corrigan, 2002; Cohen, Hilario, Sax, Hugonnet & Geissbuhler, 2006). When the majority examples heavily outnumber the minority examples some classifiers tend to ignore the minority class. Classification accuracy measure, however, does not consider this. For instance, a simple classifier that always predicts the majority class would show a 99% classification accuracy when presented with a dataset that consists of 99% majority examples and 1% minority examples. This classifier would of course be useless. In this study, we focus more on the informative AUC value (Area Under the ROC Curve) instead of relying on classification accuracy.

Several already existant approaches enable even the imbalance-sensitive classifiers to be able to successfully predict minority examples. Some of the proposed solutions focus on the algorithmic level – they modify existing classifiers and present new algorithms that are not sensitive to imbalanced learning data. Other approaches focus on the data. They modify the data itself in order to soften the ratio between the numbers of majority and minority examples.

The imbalance in data can be, for example, countered by reducing the number of majority examples by randomly removing majority examples from the dataset (random undersampling) or by replicating minority examples (random oversampling). Random undersampling and random oversampling are very straightforward algorithms that can be used to change the numbers of majority and minority examples. The effects of data undersampling and oversampling were extensively studied in the last decade (Estabrooks & Japkowitz, 2001; Chawla, Japkowitz & Koltz, 2004).

A new algorithm called SMOTE (Chawla, Bowyer, Hall & Kegelmeyer, 2002) was introduced in 2002 (see Algorithm 1). Instead of deleting or duplicating random examples in the dataset, this algorithm generates synthetic examples using the existing minority examples. For every synthetic example a minority example and one of its nearest neighbors are used to generate a new minority example. Several researchers used this algorithm in their research and developed new variations of the SMOTE algorithm (Chavla, Lazarevic, Hall & Bowyer, 2003; Akbani, Kwek & Japkowitz, 2004; Han, Wang & Mao, 2005).

Algorithm 1. Algorithm SMOTE

Until enough synthetic examples are
generated do:
Select a minority example A.
Select one of the example’s 
nearest neighbors B.
Select a random weight W 
between 0 and 1.
Create a new synthetic example C.
For every attribute do:
attValue_C = attValue_A 
+ (attValue_B - attValue_A) * W

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

The Use of Prediction Reliability Estimates on Imbalanced Datasets: A Case Study of Wall Shear Stress in the Human Carotid Artery Bifurcation

Abstract

Introduction

Complete Chapter List