Article Preview
Top1. Introduction
Data mining can discover knowledge from databases and that structured into several categories, including classification, clustering, dependency analysis and data visualization. Data classification and associative rule mining are two substantial techniques in data mining. Data classification is a systematic approach to building a classifier (model) from a training dataset that containing instances with well-known class labels. The classifier is used to classify each unknown class label input sample into one of a predefined set of their class labels, while association rules mining is a method that discovers a set of association rules from a dataset. An association rule expresses a possibility of a correlation between different features in the dataset. Rule-Based Classification (RBC), also called Associative Classification (AC) is the integration of these two latest techniques (Liu et al. 1998). RBC allows that generating classification models in the form of classification rule set from a sample of well-known class label instances, a rule consists of an antecedent (a set of attribute values) and a consequent (class value). The generated classification model can be used then to predict the classes of future unknown-label examples.
In contrast to other learning approaches such as statistical approaches or neural networks, the RBC provides models that are easily interpretable and explainable, especially in the case of small numbers of rules. Then, RBC is used in different application areas, including the areas that require an interpretation of the classifier such as medical diagnosis, credit scoring (Asghar et al. 2017) and bioinformatics (Lakshmanna and al. 2016).
In order to achieve accurate classifiers, in this paper, we aim at discovering a list of classification rules by applying a Sequential Covering Strategy (SCS), which consists of creating one-rule-at-a-time. The SCS obtains from the training dataset one rule at each stage and then add it to the pre-discovered rules set. All instances covered by the discovered rule are omitted from the training dataset. This process repeated until all instances of the training dataset are covered. However, As a consequence of the explosive expansion of real-world databases in several domains, obtaining efficient and accurate rules from such databases is a complex problem that needs intelligent systems. Thus, in this paper, we aim to apply a new swarm-based intelligent approach for solving this problem.
In the recent decade, a lot of new swarm-based optimization approaches are proposed and used successfully for optimization problems (RM, S. P and al. 2020), such as Crow Search Algorithm (CSA). CSA is a novel swarm-based optimization algorithm proposed by Askarzadeh in 2016, recently, has been widely used for several optimization problems due to his simplicity, fewer parameters tuning (flight length and awareness probability only), searching force, and faster convergence. Moreover, CSA-based optimization approach shows competitive results for continuous optimization problems in comparison with several other swarm-based approaches in terms of convergence speed and optimization accuracy Askarzadeh (2016). This new approach has no discrete version and it has not used for the classification rule generation problem. In this work we would like to extend its application in solving the Class Association Rules (CAR) mining problem in the sequential covering approach for associative classification.
The optimization problems are either continuous or discrete; in this study we have addressed the CAR mining in associative classification which is a discrete problem, but the original CSA have been proposed for solving continuous problems, then, we cannot apply the original CSA directly to this problem. With reference to the continuous crow search algorithm, this study proposes a New Discrete Crow Search Algorithm adapted to Class Association Rules mining problem, (NDCSA-CAR), in which the search space is modelled as a d-dimensions, where d stands for the number of features in the dataset. For adapting the original CSA to the CAR problem, a discrete-based encoding to represent the classification rules is proposed and new discrete operators are proposed and used in the crow’s position updating equations.
Then, the main contributions of this work are:
- •
A novel discrete swarm-based algorithm is introduced.
- •
This paper solves the CAR mining problem in sequential covering approach for associative classification.
- •
The proposed algorithm is applied on the datasets collected from a survey.
- •
The results of the proposed algorithm are compared against the results produced by traditional and recent well known approaches.