Research and Application of a Multidimensional Association Rules Mining Method Based on OLAP

Hairong Wang, Pan Huang, Xu Chen

Source Title: International Journal of Information Technology and Web Engineering (IJITWE) 16(1)

DOI: 10.4018/IJITWE.2021010104

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

As to the problems of low data mining efficiency, less dimensionality, and low accuracy of traditional multidimensional association rules in the university big data environment, an OLAP-based multi-dimensional association rule mining method is proposed, which combines hash function and marked transaction compression technology to solve the problem of excessive or redundant candidate sets in the Apriori algorithm, and uses On Line Analytical Processing to manage the intermediate data in the association mining process , in order to reduce the time overhead caused by repeated calculations. To verify the validity of the proposed method, a learning situation analysis system is constructed in the field of colleges and universities. The multi-dimensional association rules mining method is used to analyze more than 21,000 desensitized real data, in order to mine the key factors affecting students' academic performance. The experimental results show that the proposed multi-dimensional mining model has good mining results and significantly improves the time performance.

Article Preview

Top

Introduction

As an important sub-branch of the data mining, association rule is used in many industries such as student academic analysis, network log analysis, and network security. Traditional association rule mining algorithms are Apriori algorithm (Agrawal R, 1993) , FP-Growth algorithm (Han J and Pei J, 2000) and so on. However, as the amount of mining data increases, traditional association rule mining algorithms need to repeatedly record transactions, which results in I/O cannot be completed quickly, and there will be a variety of candidate sets and too many frequent itemsets. Therefore, in recent years, many researchers have paid more and more attention to multi-dimensional association rule mining. Kamber (1997) first proposed to apply data cubes to association rule mining, using the structure of the data warehouse to pre-calculate the aggregation value, thereby increasing the mining speed. Imielinski (2002) proposed to apply On Line Analytical Processing technology and association rule mining together in pattern recognition. Zhang Lei (2020) proposed an improved Apriori algorithm based on Boolean matrix. The algorithm uses Boolean matrix to reduce the computational complexity, convert transaction database into Boolean matrix for storage, and save a lot of storage space. Li Jie (2020) proposes an improved parallel Apriori algorithm based on hash storage and transaction weighting to reduce redundant calculations through the deduplication feature of hash storage; at the same time, the mapping of items and item sets is stored in a hash structure, Avoid scanning the transaction database multiple times. Wang Wei (2020) proposed an improved algorithm based on MapReduce's Apriori context constraint association rules. This method incorporates user's context constraint rules, more precise pruning rules, and uses MapReduce technology for parallel computing to improve data processing capabilities and effectiveness. Guo Peng (2019)proposes a student course performance analysis method based on improved K-means and Apriori that introduces interest. This method uses an improved Kmeans algorithm to discretize performance information and introduce interest to the association between courses, The connection relationship and the importance of the course. Wen Wu (2019) proposed a (GNA) algorithm based on genetic algorithm to find frequent itemsets, designed the k-step mining process, used crossover operators to generate candidate sets and mutation operators to filter frequent itemsets, avoiding multiple scans of the database and Reduce redundancy. Hu Shichang (2019) proposed the Node-Apriori (Node based Apriori) algorithm, which encodes item sets and transaction records by binary encoding and organizes candidate sets in a node manner, effectively reducing the memory of item sets and transaction records. Occupy and reduce the number of traversal transaction item sets. Guo Youqing (2019) proposes a MapReduce-based parallel mining algorithm for large data association patterns (Mr_GNA), which combines the GNA algorithm with Hadoop’s MapReduce parallel computing framework to ensure that the Mr_GNA algorithm can be efficiently implemented in the Hadoop cluster Dig. Du Yongxing (2019) increases the judgment data set based on the classic Apriori algorithm, reduces the generation of candidate sets, reduces a large amount of time consumption, and improves the efficiency of the improved Apriori algorithm. Qian Cheng (2019) proposed the Apriori_II (Apriori_Interest and Important) algorithm, which is based on interest items and importance functions, which reduces memory space occupation and the number of I/O operations, and improves the efficiency of mining and the effectiveness of association rule results.Feng Feng (2020) proposed using logical formulas for maximum association analysis on soft sets, combining all the key concepts of mining rules and maximum association rules into a common framework, and correspondingly provide unified mathematical characteristics of these concepts.Luna J M (2018) proposed two algorithms without pruning strategy [Apriori MapReduce (AprioriMR) and iterative AprioriMR]. The algorithm extracts any existing item set in the data, and then trims the search space through anti-monotonic properties. Youcef D (2018) proposed an effective parallel algorithm CGPUGA. It is a genetic algorithm that can run on GPU clusters to effectively discover diverse association rules. It benefits from cluster computing to generate rules. In order to promote association rule mining based on soft sets, Feng F (2016)proposed a new concept of transaction data soft set, parameter taxonomy soft set, parameter coset, parameter set realization and M realization. Several algorithms are designed to find the M realization of the parameter set, or extract the σ-M-strong and γ-M-reliable maximum association rules in the parameter classification soft set.

Complete Article List

Search this Journal:

Reset

Volume 19: 1 Issue (2024)

Volume 18: 1 Issue (2023)

Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 16: 4 Issues (2021)

Volume 15: 4 Issues (2020)

Volume 14: 4 Issues (2019)

Volume 13: 4 Issues (2018)

Volume 12: 4 Issues (2017)

Volume 11: 4 Issues (2016)

Volume 10: 4 Issues (2015)

Volume 9: 4 Issues (2014)

Volume 8: 4 Issues (2013)

Volume 7: 4 Issues (2012)

Volume 6: 4 Issues (2011)

Volume 5: 4 Issues (2010)

Volume 4: 4 Issues (2009)

Volume 3: 4 Issues (2008)

Volume 2: 4 Issues (2007)

Volume 1: 4 Issues (2006)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Research and Application of a Multidimensional Association Rules Mining Method Based on OLAP

Abstract

Introduction

Complete Article List