Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Frequent Itemset Mining in Large Datasets a Survey

Amrit Pal, Manish Kumar

Source Title: International Journal of Information Retrieval Research (IJIRR) 7(4)

DOI: 10.4018/IJIRR.2017100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Frequent Itemset Mining is a well-known area in data mining. Most of the techniques available for frequent itemset mining requires complete information about the data which can result in generation of the association rules. The amount of data is increasing day by day taking form of BigData, which require changes in the algorithms for working on such large-scale data. Parallel implementation of the mining techniques can provide solutions to this problem. In this paper a survey of frequent itemset mining techniques is done which can be used in a parallel environment. Programming models like Map Reduce provides efficient architecture for working with BigData, paper also provides information about issues and feasibility about technique to be implemented in such environment.

Article Preview

Top

Introduction

The amount of data is increasing day by day this increase in the size of data, developing some basic challenges for the frequent itemset mining algorithms. As the size of the increase the amount of time required to process the data will also increase. Millions of customers visit Walmart daily, resulting in the generation of millions of transactions. Every hour Walmart generates approximately 2.5 petabytes of data (DeZyre, 2016). Social network websites generating huge amount of unstructured data daily. Managing this huge amount of unstructured data using the conventional technique is a challenging task. The amount of data when it becomes that much in size that it becomes difficult to manage it using conventional data management systems, then it is called Big Data (Manyika, 2011). Transaction datasets are also increasing in size and taking the shape of Big Data. There are algorithms available for mining of the frequent itemsets from transactional datasets like Apriori (Agrawal, 1994), FP-Growth etc. There can be different approaches for mining the frequent itemsets from the transactional datasets, sequential and parallel approaches. Most of the available frequent itemset mining algorithms consider the sequential approach.

There are some basic requirement in processing the data for frequent itemsets. These are counting the number of transactions, counting different items in the itemset, maintain a list of items, count of the total number of transactions and complete scan of the datasets. The basic terminology of the frequent itemset mining is calculating the support of each itemset. Algorithms are required to scan the complete transaction database for calculating support count. Algorithms for frequent itemset mining can be categorized in two categories serial and parallel. A scalable model is required to manage and for retrieving itemset from this scale of data. Parallel implementation of the frequent itemset mining (FIM) techniques can be an effective technique for this purpose.

Parallel algorithms for mining frequent itemset are not easy to be implemented on the huge size of data. There are some basic challenges for mining frequent itemsets in parallel. There are some issues related to the parallel computing for mining frequent itemsets (Kumar V, 1994). One of the key aspect of the parallel computing is that each processor has its own memory and on disk for its independent functioning. It is not easy to extend sequential or serial algorithms to a parallel algorithm, which considers scan of the complete transactional database entirely for the generation of candidate set. The benefits of using parallel algorithm are that the limitation of the serial algorithm, like limited memory or limited storage, can be overcome. As the dataset size increases, it is difficult to have complete dataset and the intermediate data into the memory of a single system. Intermediate data means intermediate counts, universal counts, intermediate updating of the temporary variables. As the dataset is of large size a scalable memory is required to process this huge amount of data.

An algorithm is memory scalable (Anastasiu, 2014) if the amount of memory required per processor is a function of the following:

(1) where n is the size of the data and p is the number of processes executed in parallel. As the number of processes grows, the required amount of memory per processor for a memory scalable algorithm decrease.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 3 Released, 1 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Frequent Itemset Mining in Large Datasets a Survey

Abstract

Introduction

Complete Article List