Efficient Closure Operators for FCA-Based Classification

Efficient Closure Operators for FCA-Based Classification

Nida Meddouri (Caen-Normandy University, France & GREYC, ENSICAEN, Caen, France) and Mondher Maddouri (College of Business, University of Jeddah, Saudi Arabia)
DOI: 10.4018/IJAIML.2020070105

Abstract

Knowledge discovery in databases (KDD) aims to exploit the large amounts of data collected every day in various fields of computing application. The idea is to extract hidden knowledge from a set of data. It gathers several tasks that constitute a process, such as: data selection, pre-processing, transformation, data mining, visualization, etc. Data mining techniques include supervised classification and unsupervised classification. Classification consists of predicting the class of new instances with a classifier built on learning data of labeled instances. Several approaches were proposed such as: the induction of decision trees, Bayes, nearest neighbor search, neural networks, support vector machines, and formal concept analysis. Learning formal concepts always refers to the mathematical structure of concept lattice. This article presents a state of the art on formal concept analysis classifier. The authors present different ways to calculate the closure operators from nominal data and also present new approach to build only a part of the lattice including the best concepts. This approach is based on Dagging (ensemble method) that generates an ensemble of classifiers, each one represents a formal concept, and combines them by a voting rule. Experimental results are given to prove the efficiency of the proposed method.
Article Preview
Top

Introduction

The classification approach, which is based on formal concept analysis, is a symbolic approach allowing the extraction of correlations, reasons and rules according to the concepts discovered from data. Many learning methods based on Formal Concept Analysis are proposed, such as: JSM-method (Blinova, Dobrynin, Finn, Kuznetsov & Pankratova, 2003), CLANN (Tsopze, Mephu-Nguifo & Tindo, 2007)), CITREC (Douar, Latiri & Slimani, 2008), NAVIGALA (Visani, Bertet & Ogier, 2011), HMCS-FCA-SC (Ferrandin et al, 2013), SPFC (Ikeda & Yamamoto, 2013) and MCSD-FCA-PS (Buzmakov et al, 2016). Unfortunately, this approach encountered some problems such as exponential complexity (in the worst case), a high error rate and over-fitting (Meddouri & Maddouri, 2008,2010). Most of them handle only binary data. The construction of the all concepts can be either exhaustive or non-contextual. There is absence of the adaptive selection of concepts (Meddouri & Maddouri, 2008).

For these reasons, we focused in our research on ensemble methods used to improve the error rate of any single learner. We proposed BFC (MeddouriI & Maddouri, 2009) and BNC (Meddouri & Maddouri, 2010) methods based on sequential learning (Boosting). All the data are considered in each learning step and weights are assigned to learning instances. However, it was proved that sequential learning (Boosting) is not interesting, insufficient for a more efficient classifier as Decision Tree (Meddouri & Maddouri, 2010). Other ensemble learning methods exists, and they are based on parallel learning. The difference between these two ensemble methods, derives from how to select data for learning. They are distinguished by the data sampling techniques as Bootstrapping used to learn the classifiers from particular subsets. The particularity of learning from a Bootstrap is to combine hard learning instances to misleading instances in the training set (unlike the sequential approach) (Breiman, 96a, 96b). The best known method, which is based on this type of learning is Dagging (Disjoint samples aggregating) (Kotsiantis, Anyfantis, Karagiannopoulus & Pintelas, 2007) that creates a number of disjoint groups and stratified data from the original learning data set (Ting & Witten, 1997), each considered as a subset of learning. The classifier is built on this learning sets. The predictions are then obtained by combining the classifiers outputs by majority voting (Ting & Witten, 1997). This method has shown its importance in recent work (Meddouri, Khoufi & Maddouri, 2014). Then, we propose to use this technique in this work to study the classifier ensembles based on formal concepts, since, no study has focused on the formal concepts in the context of parallel learning.

In section 2, we present a state of the art on Formal Concept Analysis. In section 3, we propose classifiers using closure operators based on Formal Concept Analysis. In the section 4, an experimental study is presented to evaluate the performance of nominal classifiers based on different closure operators. An experimental study is also presented showing the importance of parallel learning compared to single learning for classifiers based on Formal Concept Analysis.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 2 Issues (2021): Forthcoming, Available for Pre-Order
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing