Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Kun-Ming Yu (Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan), Sheng-Hui Liu (School of Software, Harbin University of Science and Technology, Harbin, China), Li-Wei Zhou (School of Software, Harbin University of Science and Technology, Harbin, China) and Shu-Hao Wu (Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan)
Copyright: © 2015 |Pages: 23
DOI: 10.4018/IJGHPC.2015040106
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.
Article Preview

1. Introduction

Data mining refers to the discovery of potentially useful hidden knowledge in huge amounts of data. Frequent itemset mining is a major domain of data mining that plays an important role in extracting meaningful information. The goal of Frequent Itemset Mining (FIM) is to find frequently appearing subsets within a database of sets. Important application areas are machine learning, web log mining, information retrieval, business intelligence, and many more. As a result, frequent itemset mining over data streams has been one of the issues receiving the most attention in the data mining research areas.

With the development of modern society, the size of various datasets has been increasing tremendously in recent years as speedups in processing and communication have greatly improved the capability for data processing in all areas. Consequently, identifying important and meaningful information has become much more complex than before. One of the more challenging problems in data mining is discovering association rules from large databases of transactions where each transaction consists of a set of items. Association rules mining (Agawal et al., 1993; 1994) determines relations among itemsets in a database. The effectiveness of this technique is determined by quickly and correctly finding interesting correlation relationships between items in large databases. Because of its significance in many applications, a number of/numerous revised algorithms have been introduced, and yet, association rule mining is still in need of more research. The mining of association rules includes two sub procedures, (1) candidate generating and (2) finding all frequent itemsets that appear more often than a minimum support threshold would allow. Applying the results of data mining to the planning of a company’s strategy could effectively increase the profit and reduce the risks.

In the digital field, the technology in computer hardware architecture has been revolutionized by expanding main memory and evolving processors from single-core to multi-core, many-core or even cloud systems (Grossman et al., 2008; Hu, 2012; Meenakshi et al., 2010; Suneetha et al., 2011; Zhou et al., 2010). Previously, the traditional sequential data mining algorithm (Fakhrahmad et al., 2011; Jin, 2009; Prakash et al., 2010; Yu et al., 2010; Yun et al., 2005) would take a tremendous amount of time in handling large datasets. These algorithms have not kept up to date with the latest computer architectures and relatively little effort has been devoted to mapping these algorithms to/for high-performance platforms.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing