Article Preview
TopIntroduction
Frequent itemset mining (Agrawal & Srikant, 1994; Cho, Pei, & Wang, 2005; Han, Cheng, Xin, & Yan, 2007) is a fundamental and primary research topic in data mining. It started as a phase in the discovery of association rules (Ashrafi, Taniar, & Smith, 2007; Daly & Taniar, 2004; Jiang, Hu, Wei, Song, Han, & Liang, 2011; Taniar, Rahayu, Lee, & Daly, 2008; Tjioe & Taniar, 2005; Vo, Hong, & Le, 2013), but has since been generalized, independent of these, to many other patterns, for example, frequent sequences (Al-Mulla & Aghbari, 2011; Lu, Chen, Adjei, & Keech, 2008), episodes (Tatti & Cule, 2012), movement pattern (Taniar & Goh, 2007), and frequent subtree (Nguyen & Yamamoto, 2012; Xiao, Yao, & Yang, 2005).
One of the most important and popular applications for frequent itemset mining is market basket analysis. However, the itemsets found without considering their quantity and profit/weight may mask the really important information. In fact, the quantity and profit of items are important factors that cannot be ignored absolutely. Therefore, frequent itemset mining has the inevitable limitations for users who desire to find itemsets with high profit. For example, in retail applications, frequent itemsets identified by the traditional frequent itemset mining algorithms may contribute only a small portion of the overall revenue or profit, as high margin or luxury goods typically do not appear frequently in transactions. A similar problem occurs when data mining is applied within an enterprise to identify the most valuable client segments, or product combinations that contribute most to the company's bottom line. To address the limitation, utility mining (Chan, Yang, & Shen, 2003; Yu, Shao, Luo, & Zeng, 2009) emerges as an important topic in today’s data mining task.
In utility mining, each item in transactions has its own profit and can occur more than once. The utility of an itemset represents its importance or profitability to users. The utility values of items in a transaction database consist of two parts: one is the profit of distinct items, which is called external utility; while the other is the quantity of items in one transaction, which is called internal utility. The utility of an itemset is defined as the external utility multiplied by the internal utility. An itemset is called a high utility itemset (HUI) if its utility is no smaller than a user specified value of minimum utility.
Nevertheless, HUI mining is not an easy task. The difficulty is that it does not follow the downward closure property (Agrawal & Srikant, 1994) that plays an important role in frequent itemset mining, which means, a high utility itemset may consist of some low utility subsets. In other words, we cannot prune these itemsets with low utility directly, as it may cause a lack of actual high utility itemsets. To facilitate the task of HUI mining, Liu et al. (2005) defined high transaction-weighted utilization itemset, which satisfying downward closure property.
Earlier studies (Li, Yeh, & Chang, 2008; Yao, Hamilton, & Butz, 2004) suffered from the level-wise candidate generation-and-test problem, involving several database scans depending on the length of candidate itemsets. In view of this, some novel tree structures (Ahmed, Tanbeer, Jeong, & Lee, 2009; Tseng, Wu, Shie, & Yu, 2010) have been used for HUI mining. Based on the pattern growth approach (Han, Pei, Yin, & Mao, 2004), these algorithms are usually efficient than candidate generation-and-test methods.