Metaheuristics Guided by the Apriori Principle for Association Rule Mining: Case Study - CRO Metaheuristic

Metaheuristics Guided by the Apriori Principle for Association Rule Mining: Case Study - CRO Metaheuristic

Abir Derouiche, Abdesslem Layeb, Zineb Habbas
Copyright: © 2020 |Pages: 24
DOI: 10.4018/IJOCI.2020070102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Association rule mining (ARM), one of the most known tasks in data mining, is considered as an optimization problem. The ARM problem can be solved either by exact methods or by metaheuristics. Exact methods such as Apriori algorithm are very efficient to deal with small and medium datasets. However, when dealing with large size datasets, these methods suffer from time complexity. Metaheuristics are proven faster but most of them suffer from accuracy. To deal with these two challenging issues, this work investigates to enhance metaheuristics and proposes hybrid approaches, which combine metaheuristics and the Apriori principle to intelligently explore the association rules space. To validate the proposed approaches the chemical reaction optimization metaheuristic (CRO) was used. Intensive experiments have been carried out and the first results are very promising in terms of accuracy and processing time.
Article Preview
Top

Introduction

Association Rule Mining (ARM) is the process of Data Mining research area that aims to find associations and co-relations between sets of items from a given transactional dataset, it has a lot of applications in diverse areas. Nowadays ARM is considered as an important optimization problem. This problem can be solved either by exact methods or by metaheuristics as well. Among the different exact methods, the Apriori algorithm is one of the most efficient techniques used to solve ARM problem based on an interesting principle called the Apriori principle. The Apriori principle is easy to implement and it has a lot of advantages when pruning the search space. Unfortunately, the Apriori algorithm has high time complexity therefore, it is in practice limited only to mine small and medium-sized datasets.

On the other hand, several metaheuristics have been developed in the last decades to deal with ARM problem. They deal generally with large datasets. However, most population-based metaheuristics used, suffer from both rules quality and time complexity. These problems are due to the random initialization and the random exploration of the search space. To deal with these drawbacks, and inspired by the work of (Djenouri & Comuzzi, 2017), this paper proposes a framework to enhance the performance of any population-based metaheuristic used to mine association rules by exploiting the principle of Apriori.

Mainly this contribution is based on the two following ideas:

  • The random generation of the initial population considers all items including the infrequent ones, whereas the infrequent items are useless because they always lead to infrequent rules. In the proposed contribution, only frequent items are considered the initial random population is replaced by a set of valid rules of size two obtained using the Apriori principle;

  • In the search exploration step, the population-based metaheuristic considers a special list of frequent itemsets instead of using all the itemsets. This strategy discards a lot of useless sub-regions and it significantly improves the running time of the considered metaheuristic.

The considered metaheuristic can exploit the initial rules in two different ways. So two original population-based metaheuristics guided by the Apriori principle for solving ARM are proposed. The first approach called MGapriori-ARM1 for “based-population Metaheuristic Guided by the Apriori principle for ARM approach 1” is different from most metaheuristics used for ARM problem. Indeed, this approach is used in an iterative way in order to enhance at each iteration a set of relevant rules of size k created using the ExpandRules procedure proposed in (T. T. Nguyen et al., 2017). On the other hand, the second approach called MGapriori-ARM2 for “based-population Metaheuristic Guided by the Apriori principle for ARM approach 2” follows the process of most metaheuristics. MGapriori-ARM2 starts with an initial population then it improves it before selecting the best solutions. However, this approach considers a different initialization procedure and a different iteration phase as well. The initial population is a set of valid rules of size two. Besides, in the iteration step, the proposed metaheuristic should be applied in a specific way to avoid stagnation. The exploitation operators can modify the solution without changing the size of the rule, while the exploration operators add at least one item to the rule from the set of frequent items.

To validate the proposed approaches from experimental consideration, without loss of generality, the metaheuristic CRO (Chemical Reaction Optimization metaheuristic) has been used. This choice is motivated by the fact that CRO is a relatively recent based-population metaheuristic that has been very successful for solving many optimization problems. Both approaches have been implemented and tested. First, the two approaches are compared in terms of quality and time complexity, then compared to other related works. The first experimental results are rather promising in terms of accuracy and processing time for small, medium, and large datasets.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022)
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing