Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data

Khyati Ahlawat, Anuradha Chug, Amit Prakash Singh

Source Title: International Journal of Grid and High Performance Computing (IJGHPC) 11(3)

DOI: 10.4018/IJGHPC.2019070102

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.

Article Preview

Top

1. Introduction

The era is experiencing a data explosion trend everywhere, in every form termed as Big Data (Hu, Wen, Chua, et al., 2014). Data mining with versatile big data requires exploring challenges at data, model and system levels and has become very compelling task (Tsai, Lai, Chao, et al., 2015; Wu, Zhu, Wu, et al., 2014). Big data is widely used in prediction-based system like short term load forecasting (Zhang, Cheng, Liu, et al., 2014), traumatic brain injury survival rate prediction (Rodger, 2015), in noisy big data (Yang & Fong, 2012). Machine learning techniques are proving to be highly efficient in such domains by modifying and adapting them to Map Reduce framework (Bechini, Marcelloni & Segatori, 2016; Hochbaum& Baumann, 2014). Classification or supervised machine learning has also been proved applicable in uncertainty reduction of big data (Wang, He, Chow, et al., 2015) in fuzzy systems (Fernández, Carmona, Jesus, et al., 2016; He, Wang, Zhuang, et al., 2015). Where classification algorithms typically require all data in same format and at same machine (Hochbaum & Baumann, 2014), Petuum, a platform for machine learning is capable of handling big data in a distributed manner (Xing, Ho, Dai, et al., 2015) though it is immature as compared to Spark and Hadoop. Apache Spark and Mahout are very popular tools that use Machine Learning Library, MLlib to address big data problems (Landset, Khoshgoftaar, Richter, et al., 2015). Researchers have studied current state of art of machine learning in sustainable data modeling for big data (Al-Jarrah, Yoo, Muhaidat, et al., 2015).

Usually, different classifiers learn by their pre-decided algorithm formulation and concept but some external factors also affect their learning process. One of such factors is class distribution which is the proportion of instances of each class in any dataset (Galar, Fernández, Barrenechea, et al., 2016). When this distribution is not balanced, datasets are termed as imbalanced datasets and learning performed on such datasets is known as imbalanced classification. Classifier learning from these imbalanced datasets is becoming a hot research topic in big data mining discipline.

Problem of imbalanced dataset occurs when instances of one class, which is of main interest as per the application field is under-represented as compared to other class. The Imbalance Ratio (IR)(López, Fernández, García, et al., 2013) that is used to define the extent of imbalance in any dataset. Normally classifiers tend to ignore the minority class samples considering them as outlier or noise and whole classification process lose its meaning and ability in such case. Consider an example of medical diagnosis (Ganganwar, 2012) where inputs are various parameters of patients based on which it is predicted that whether they are suffering from cancer or not. Assuming that non-cancer patients are 10000 and cancer patients are 10, two types of classifiers are learned for this problem. Classifier 1 classified 7 out of 10 cancer patients as fit and 10 out of 10000 other patients as cancer patients. On the other hand, classifier 2 classified 2 out of 10 cancer patients as fit patients and 100 out of 10000 other patients as cancer patients. Now, based upon classifier’s fallacy, classifier 1 is better than second classifier as number of mistakes in case of first is 17 and for second classifier it is 102. However, focusing on cancer patient classification, classifier 2 performs better than first one. Consequently, for such applications where correct classification of cancer patients is crucial, any algorithm will pick classifier 1 over classifier 2 which is a challenging problem.

This problem of class imbalance dataset becomes more challenging when normal data expands exponentially to big data. To address class imbalance problem in context of big data, traditional machine learning algorithms and classifiers need to be adapted with new big data technologies so that, an efficient mechanism for classifier learning can be obtained.

Complete Article List

Search this Journal:

Reset

Volume 16: 1 Issue (2024)

Volume 15: 2 Issues (2023)

Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming

Volume 13: 4 Issues (2021)

Volume 12: 4 Issues (2020)

Volume 11: 4 Issues (2019)

Volume 10: 4 Issues (2018)

Volume 9: 4 Issues (2017)

Volume 8: 4 Issues (2016)

Volume 7: 4 Issues (2015)

Volume 6: 4 Issues (2014)

Volume 5: 4 Issues (2013)

Volume 4: 4 Issues (2012)

Volume 3: 4 Issues (2011)

Volume 2: 4 Issues (2010)

Volume 1: 4 Issues (2009)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data

Abstract

1. Introduction

Complete Article List