Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

Sikha Bagui, Keerthi Devulapalli, Sharon John

Source Title: International Journal of Intelligent Information Technologies (IJIIT) 16(2)

DOI: 10.4018/IJIIT.2020040101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naïve Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.

Article Preview

Top

Introduction

Naïve Bayes (NB) classification is robust and effective in practice and over the years this classifier has had a wide variety of applications in many complex domains such as text classification (Korpipaa et al., 2003; Yuan et al., 2012), document classification (Viegas et al., 2015) and sentiment analysis (Dei et al., 2007; Narayan et al., 2013). This probability-based classification model is easy to use for discrete values. But, in today’s Big Data era, we have to deal with more and more data that has continuous values, and this has become a major challenge. Of course, the first major implementation goal of the NB classifier in today’s Big Data environment is to use a parallel processing environment, and fortunately NB classifiers are naturally amenable for parallelization since the probability counts estimation for each attribute can be determined independent of each other.

In Hadoop’s MapReduce environment, the NB classification model for discrete values can be implemented with two MapReduce jobs, one for constructing the model (learning phase) and the second for applying the model on unlabeled data (test phase). The parallel NB implementation is not as straightforward for continuous values. For continuous values, there are two options: (i) discretizing the continuous values, or, (ii) using the Gaussian distribution probability density function. In this paper, the first method is referred to as the Discrete/Multinomial NB model and the second method is referred to as the Mixed NB model:

1.
Discretizing the continuous values and then building the NB model with the discrete values. This option requires an extra pre-processing step of discretizing the continuous values. This extra step could be time consuming and resource intensive for Big Data. On the MapReduce platform, chained MapReduce jobs become resource intensive since the output of every MapReduce job has to be stored on Hadoop’s Distributed File System (HDFS). Based on the chosen discretization algorithm, the discretization process can take one or more MapReduce jobs;
2.
Using the Gaussian distribution probability density function for continuous values. This model, referred to as the Mixed NB model, handles both continuous as well as discrete values. For discrete values, the Multinomial distribution is used for probability estimation, and for continuous values, the Gaussian distribution is used for probability estimation. This method does not require a pre-processing step. Here the NB model can be built in one MapReduce job and the model can be used for classification in another MapReduce job. Hence this model requires a lesser number of MapReduce jobs than the Discrete NB model.

The main contribution of this study is to present a way to efficiently deal with discrete as well as continuous values in Big Data. The paper implements and compares the parallel implementations of both the Discrete and Mixed NB models for a very large dataset containing a mixture of the continuous and discrete values on Hadoop’s MapReduce platform and evaluates the performance of both models.

The organization of the paper is as follows. Background knowledge of the Naïve Bayes model is presented in section 2 followed by the theory behind the two event models in the subsections. Section 3 presents a review of the literature. Actual implementation details are discussed in section 4. Section 5 presents the experimental setup and results and a discussion of the results and finally section 6 presents the conclusions of the study.

Top

With respect to the parallel implementation of the NB algorithm, He et al. (2010) and Zhon et al. (2012) implemented the NB classifier in a parallelized environment for handling large categorical datasets. They significantly improved their efficiency over the serial NB algorithms.

Complete Article List

Search this Journal:

Reset

Volume 20: 1 Issue (2024)

Volume 19: 1 Issue (2023)

Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 17: 4 Issues (2021)

Volume 16: 4 Issues (2020)

Volume 15: 4 Issues (2019)

Volume 14: 4 Issues (2018)

Volume 13: 4 Issues (2017)

Volume 12: 4 Issues (2016)

Volume 11: 4 Issues (2015)

Volume 10: 4 Issues (2014)

Volume 9: 4 Issues (2013)

Volume 8: 4 Issues (2012)

Volume 7: 4 Issues (2011)

Volume 6: 4 Issues (2010)

Volume 5: 4 Issues (2009)

Volume 4: 4 Issues (2008)

Volume 3: 4 Issues (2007)

Volume 2: 4 Issues (2006)

Volume 1: 4 Issues (2005)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

Abstract

Introduction

Complete Article List

MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

Abstract

Introduction

Background And Related Work

Complete Article List