Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Reliable Distributed Fuzzy Discretizer for Associative Classification of Big Data

Hepzi Jeya Pushparani, Nancy Jasmine Goldena

Source Title: International Journal of Information Retrieval Research (IJIRR) 12(1)

DOI: 10.4018/IJIRR.289572

Article PDF Download Open access articles are freely available for download

Abstract

Data Mining is an essential task because the digital world creates huge data daily. Associative classification is one of the data mining task which is used to carry out classification of data, based on the demand of knowledge users. Most of the associative classification algorithms are not able to analyze the big data which are mostly continuous in nature. This leads to the interest of analyzing the existing discretization algorithms which converts continuous data into discrete values and the development of novel discretizer Reliable Distributed Fuzzy Discretizer for big data set. Many discretizers suffer the problem of over splitting the partitions. Our proposed method is implemented in distributed fuzzy environment and aims to avoid over splitting of partitions by introducing a novel stopping criteria. Proposed discretization method is compared with existing distributed fuzzy partitioning method and achieved good accuracy in the performance of associative classifiers.

Article Preview

Top

Introduction

Every second , the world creates a large volume of data in different domains, with reference to the International Data Corporation (IDC) study forecasting that the global data sphere will grow from 33 Zettabytes (ZB) in 2018 to 175 ZB by 2025. Large volumes of data beyond conventional system’s storage and processing capacities are known as Big data(Minelli et al., 2013). Real world data is categorical, numerical , continuous and various formats. The most efficient task is the extraction of information from that data. Classification algorithms are developed to meet the growing demand of data .The art of integrating frequent pattern mining and classification is known as Associative classification (Abdelhamid et al., 2012; Baralis & Garza, 2012) Many studies have shown that associative classifications have specific advantages over other traditional classification approaches such as Decision Tree and Rule Induction(Wedyan, 2014). First associations are extracted from the dataset using frequent pattern mining algorithms(Aggarwal et al., 2014) and then the classification rules are created. Most of the frequent pattern mining algorithm works only on categorical attributes. In order to improve the speed and accuracy of associative classifier , an efficient discretizer is required to discretize real data.

Discretization is a task of data preprocessing that transforms continuous features into discrete one, helping to enhance learning performance. Most of the algorithms for data mining work on discrete values. So discretization is carried out prior to the process of classification. Supervised discretization methods use the class information to set partition boundaries while unsupervised discretization methods do not use class labels to pick cut points. Entropy based discretization method uses class information to compute and evaluate the split point, which is certainly supervised and separated from top to bottom. Association rule learners prefer multivariate discretization that can capture the interdependencies between attributes, while univariate discretization discretes each attribute in isolation which tends to dissatisfactory association rules (Ishibuchi et al., 2001).

Discretization with fuzzy set is known as fuzzy discretization which resolves soft boundary problem. Fuzzy discretization first discretizes quantitative attribute values into intervals(Ishibuchi et al., 2001). Each cutpoint is associated with the membership function .The membership function is used to determine the degree of each attribute value corresponding to each interval. In fuzzy discretization, a value can be discretized into more than one interval at the same time with varying degrees. Fuzzy discretization process has the following steps (1) Identification of cutpoints (2) partitions are created based on the cutpoints (3) Using triangle membership function attribute values are converted into fuzzified values.

Classical data preprocessing techniques are not enough to scale well when managing large volume of data. To deal the problem with big data, scalable distributed techniques are developed. The first distributed programming techniques to tackle this problem are MapReduce(Dean & Ghemawat, 2004) and its open-source version Apache Hadoop. Apache Spark(Karau, Holden ; Konwinski, Andy ; Wendell Patrick ; Zaharia, 2015) is a fast, memory based data processing tool for large scale data processing . Through the ability of this Spark, processes present in many Machine Learning (ML) problems may be speeded up. So this tool has become especially popular among researchers and business experts in machine learning. Our main objective is to prove that in these frameworks, proposed discretization algorithm Reliable Distributed Fuzzy Discretizer can be parallelized, providing strong discretization solutions for Big data analytics. An efficient discretizer gives good classification accuracy in association rule mining. In order to prove the effectiveness of our proposed discretizer, RDFD is compared with distributed MDLP discretizer.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Reliable Distributed Fuzzy Discretizer for Associative Classification of Big Data

Abstract

Introduction

Complete Article List