BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap

BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap

Wei Song (College of Information Engineering, North China University of Technology, Beijing, China), Yu Liu (College of Information Engineering, North China University of Technology, Beijing, China) and Jinhong Li (College of Information Engineering, North China University of Technology, Beijing, China)
Copyright: © 2014 |Pages: 15
DOI: 10.4018/ijdwm.2014010101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Although a number of relevant approaches have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. In this paper, the authors propose an efficient algorithm, namely BAHUI (Bitmap-based Algorithm for High Utility Itemsets), for mining high utility itemsets with bitmap database representation. In BAHUI, bitmap is used vertically and horizontally. On the one hand, BAHUI exploits a divide-and-conquer approach to visit itemset lattice by using bitmap vertically. On the other hand, BAHUI horizontally uses bitmap to calculate the real utilities of candidates. Using bitmap compression scheme, BAHUI reduces the memory usage and makes use of the efficient bitwise operation. Furthermore, BAHUI only records candidate high utility itemsets with maximal length, and inherits the pruning and searching strategies from maximal itemset mining problem. Extensive experimental results show that the BAHUI algorithm is both efficient and scalable.
Article Preview

Introduction

Frequent itemset mining (Agrawal & Srikant, 1994; Cho, Pei, & Wang, 2005; Han, Cheng, Xin, & Yan, 2007) is a fundamental and primary research topic in data mining. It started as a phase in the discovery of association rules (Ashrafi, Taniar, & Smith, 2007; Daly & Taniar, 2004; Jiang, Hu, Wei, Song, Han, & Liang, 2011; Taniar, Rahayu, Lee, & Daly, 2008; Tjioe & Taniar, 2005; Vo, Hong, & Le, 2013), but has since been generalized, independent of these, to many other patterns, for example, frequent sequences (Al-Mulla & Aghbari, 2011; Lu, Chen, Adjei, & Keech, 2008), episodes (Tatti & Cule, 2012), movement pattern (Taniar & Goh, 2007), and frequent subtree (Nguyen & Yamamoto, 2012; Xiao, Yao, & Yang, 2005).

One of the most important and popular applications for frequent itemset mining is market basket analysis. However, the itemsets found without considering their quantity and profit/weight may mask the really important information. In fact, the quantity and profit of items are important factors that cannot be ignored absolutely. Therefore, frequent itemset mining has the inevitable limitations for users who desire to find itemsets with high profit. For example, in retail applications, frequent itemsets identified by the traditional frequent itemset mining algorithms may contribute only a small portion of the overall revenue or profit, as high margin or luxury goods typically do not appear frequently in transactions. A similar problem occurs when data mining is applied within an enterprise to identify the most valuable client segments, or product combinations that contribute most to the company's bottom line. To address the limitation, utility mining (Chan, Yang, & Shen, 2003; Yu, Shao, Luo, & Zeng, 2009) emerges as an important topic in today’s data mining task.

In utility mining, each item in transactions has its own profit and can occur more than once. The utility of an itemset represents its importance or profitability to users. The utility values of items in a transaction database consist of two parts: one is the profit of distinct items, which is called external utility; while the other is the quantity of items in one transaction, which is called internal utility. The utility of an itemset is defined as the external utility multiplied by the internal utility. An itemset is called a high utility itemset (HUI) if its utility is no smaller than a user specified value of minimum utility.

Nevertheless, HUI mining is not an easy task. The difficulty is that it does not follow the downward closure property (Agrawal & Srikant, 1994) that plays an important role in frequent itemset mining, which means, a high utility itemset may consist of some low utility subsets. In other words, we cannot prune these itemsets with low utility directly, as it may cause a lack of actual high utility itemsets. To facilitate the task of HUI mining, Liu et al. (2005) defined high transaction-weighted utilization itemset, which satisfying downward closure property.

Earlier studies (Li, Yeh, & Chang, 2008; Yao, Hamilton, & Butz, 2004) suffered from the level-wise candidate generation-and-test problem, involving several database scans depending on the length of candidate itemsets. In view of this, some novel tree structures (Ahmed, Tanbeer, Jeong, & Lee, 2009; Tseng, Wu, Shie, & Yu, 2010) have been used for HUI mining. Based on the pattern growth approach (Han, Pei, Yin, & Mao, 2004), these algorithms are usually efficient than candidate generation-and-test methods.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing