Article Preview
TopIntroduction
As an important sub-branch of the data mining, association rule is used in many industries such as student academic analysis, network log analysis, and network security. Traditional association rule mining algorithms are Apriori algorithm (Agrawal R, 1993) , FP-Growth algorithm (Han J and Pei J, 2000) and so on. However, as the amount of mining data increases, traditional association rule mining algorithms need to repeatedly record transactions, which results in I/O cannot be completed quickly, and there will be a variety of candidate sets and too many frequent itemsets. Therefore, in recent years, many researchers have paid more and more attention to multi-dimensional association rule mining. Kamber (1997) first proposed to apply data cubes to association rule mining, using the structure of the data warehouse to pre-calculate the aggregation value, thereby increasing the mining speed. Imielinski (2002) proposed to apply On Line Analytical Processing technology and association rule mining together in pattern recognition. Zhang Lei (2020) proposed an improved Apriori algorithm based on Boolean matrix. The algorithm uses Boolean matrix to reduce the computational complexity, convert transaction database into Boolean matrix for storage, and save a lot of storage space. Li Jie (2020) proposes an improved parallel Apriori algorithm based on hash storage and transaction weighting to reduce redundant calculations through the deduplication feature of hash storage; at the same time, the mapping of items and item sets is stored in a hash structure, Avoid scanning the transaction database multiple times. Wang Wei (2020) proposed an improved algorithm based on MapReduce's Apriori context constraint association rules. This method incorporates user's context constraint rules, more precise pruning rules, and uses MapReduce technology for parallel computing to improve data processing capabilities and effectiveness. Guo Peng (2019)proposes a student course performance analysis method based on improved K-means and Apriori that introduces interest. This method uses an improved Kmeans algorithm to discretize performance information and introduce interest to the association between courses, The connection relationship and the importance of the course. Wen Wu (2019) proposed a (GNA) algorithm based on genetic algorithm to find frequent itemsets, designed the k-step mining process, used crossover operators to generate candidate sets and mutation operators to filter frequent itemsets, avoiding multiple scans of the database and Reduce redundancy. Hu Shichang (2019) proposed the Node-Apriori (Node based Apriori) algorithm, which encodes item sets and transaction records by binary encoding and organizes candidate sets in a node manner, effectively reducing the memory of item sets and transaction records. Occupy and reduce the number of traversal transaction item sets. Guo Youqing (2019) proposes a MapReduce-based parallel mining algorithm for large data association patterns (Mr_GNA), which combines the GNA algorithm with Hadoop’s MapReduce parallel computing framework to ensure that the Mr_GNA algorithm can be efficiently implemented in the Hadoop cluster Dig. Du Yongxing (2019) increases the judgment data set based on the classic Apriori algorithm, reduces the generation of candidate sets, reduces a large amount of time consumption, and improves the efficiency of the improved Apriori algorithm. Qian Cheng (2019) proposed the Apriori_II (Apriori_Interest and Important) algorithm, which is based on interest items and importance functions, which reduces memory space occupation and the number of I/O operations, and improves the efficiency of mining and the effectiveness of association rule results.Feng Feng (2020) proposed using logical formulas for maximum association analysis on soft sets, combining all the key concepts of mining rules and maximum association rules into a common framework, and correspondingly provide unified mathematical characteristics of these concepts.Luna J M (2018) proposed two algorithms without pruning strategy [Apriori MapReduce (AprioriMR) and iterative AprioriMR]. The algorithm extracts any existing item set in the data, and then trims the search space through anti-monotonic properties. Youcef D (2018) proposed an effective parallel algorithm CGPUGA. It is a genetic algorithm that can run on GPU clusters to effectively discover diverse association rules. It benefits from cluster computing to generate rules. In order to promote association rule mining based on soft sets, Feng F (2016)proposed a new concept of transaction data soft set, parameter taxonomy soft set, parameter coset, parameter set realization and M realization. Several algorithms are designed to find the M realization of the parameter set, or extract the σ-M-strong and γ-M-reliable maximum association rules in the parameter classification soft set.