Frequent Itemset Mining Algorithm Based on Linear Table

Frequent Itemset Mining Algorithm Based on Linear Table

Jun Lu, Wenhe Xu, Kailong Zhou, Zhicong Guo
Copyright: © 2023 |Pages: 21
DOI: 10.4018/JDM.318450
Article PDF Download
Open access articles are freely available for download

Abstract

Aiming at the speed of frequent itemset mining, a new frequent itemset mining algorithm based on a linear table is proposed. The linear table can store more shared information and reduce the number of scans to the original dataset. Furthermore, operations such as pruning and grouping are also used to optimize the algorithm. For different datasets, the algorithm shows different mining speeds. (1) In sparse datasets, the algorithm achieves an average 45% improvement in mining speed over the bit combination algorithm, and a 2-3 times improvement for the classic FP-growth algorithm. (2) In dense datasets, the average improvement over the classic FP-growth algorithm is 50-70%. For the bit combination algorithm, there are dozens of times of improvement. In fact, the algorithm that integrates bit combinations with bitwise AND operation can effectively avoid recursive operations and it is beneficial to the parallelization. Further analysis shows that the linear table is easy to split to facilitate the data batch mining processing.
Article Preview
Top

The Apriori-like algorithm is a kind of frequent itemset mining algorithm with a relatively simple structure, and the mining process adopts the method of breadth-first traversal. The IAA (Wu et al., 2009) uses a count-based and record-generated method to improve the performance of the Apriori algorithm. Yuan et al. (2017) further use an overlapping strategy to compute support for achieving high efficiency. The FP-growth algorithm (Han et al., 2000) constructs an FP-tree structure and uses depth-first traversal for mining. A new DPT algorithm (Qu et al., 2020) is proposed, which only uses a dynamic prefix tree structure and can directly output a prefix tree representing all frequent itemsets, thus avoiding the high cost of building many prefix trees. The negFIN algorithm (Aryabarzan et al., 2018) adopts the bitwise operation to extract the NegNodesets structure of the itemset and uses a set enumeration tree to generate frequent itemsets, which is pruned by the promotion method. FDM is a new algorithm based on FP-tree and DIFFset (Gatuha & Jiang, 2017) data structures, which can mine long and short patterns from dense and sparse datasets. The SS-FIM algorithm (Djenouri et al., 2017) performs a single scan of the transactional database to extract frequent itemsets, and the algorithm is less sensitive to the changes in the minimum support threshold set by the user. The dFIN algorithm (Deng, 2016) represents the itemsets through the DiffNodeset structure. The algorithm uses a hybrid search strategy to find frequent itemsets on the set enumeration tree and directly enumerates frequent itemsets without generating candidate itemsets under certain conditions. The PrePost (Deng et al., 2012) and PrePost+ (Deng et al., 2015) use a vertical N-list structure to store the key information of frequent itemsets, which calculates the support through the intersection of N-lists. In particular, the PrePost+ algorithm adopts an effective pruning strategy called Children-Parent Equivalence pruning, which reduces the search space. The Nodeset (Deng & Lv, 2014) is a more efficient data structure, it only needs pre-order or post-order encoding of each node, which makes it save half the memory compared with the N-list.

Complete Article List

Search this Journal:
Reset
Volume 35: 1 Issue (2024)
Volume 34: 3 Issues (2023)
Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 32: 4 Issues (2021)
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing