DFC: A Performant Dagging Approach of Classification Based on Formal Concept

DFC: A Performant Dagging Approach of Classification Based on Formal Concept

Nida Meddouri, Hela Khoufi, Mondher Maddouri
DOI: 10.4018/IJAIML.20210701.oa3
Article PDF Download
Open access articles are freely available for download

Abstract

Knowledge discovery data (KDD) is a research theme evolving to exploit a large data set collected every day from various fields of computing applications. The underlying idea is to extract hidden knowledge from a data set. It includes several tasks that form a process, such as data mining. Classification and clustering are data mining techniques. Several approaches were proposed in classification such as induction of decision trees, Bayes net, support vector machine, and formal concept analysis (FCA). The choice of FCA could be explained by its ability to extract hidden knowledge. Recently, researchers have been interested in the ensemble methods (sequential/parallel) to combine a set of classifiers. The combination of classifiers is made by a vote technique. There has been little focus on FCA in the context of ensemble learning. This paper presents a new approach to building a single part of the lattice with best possible concepts. This approach is based on parallel ensemble learning. It improves the state-of-the-art methods based on FCA since it handles more voluminous data.
Article Preview
Top

Introduction

In this paper, we are interested in classification. Classification is a two-phase process: a learning phase which organizes the information extracted from a set of objects (or data) and a classification phase which determines the label/class of new objects. Many supervised classification techniques are proposed such as Classification by Formal Concept Analysis, Decision Tree, Bayes Net, SVM, Neural Networks

The Formal Concept Analysis is a formalization of the philosophical notion of concept, defined as a pair of extension and intention of the concept. The intention of a concept refers to necessary and sufficient attributes of the concept in question. The extension of a concept is the set of instances that have been learned this concept. Several classification methods are proposed since the semantic richness is guaranteed by the Formal Concept Analysis (Poelmans et al., 2013). Unfortunately, this classification methods encountered some problems such as an exponential complexity (in the worth case), a high error rate and overfitting (Meddouri & Maddouri, 2009; Meddouri & Maddouri, 2010).

Several ensemble methods are used to improve the error rate of any single learner (Freund, 1995) (Freund & Shapire, 1996). These proposed methods are based on sequential learning (Boosting). All the data are considered in each learning step and the weights are assigned to learning instances. However, Kuncheva reported that sequential learning (Boosting) is not enough for efficient classifier such as Decision Tree (Kuncheva et al., 2002). In the area of supervised learning, other ensembles exist, and they are based on parallel learning. The difference between these two ensemble methods derives from how to select data for learning. They are distinguished by the data sampling techniques as Bootstrapping used to learn the classifiers from subsets. The particularity of learning from a Bootstrap is to combine hard learning instances to misleading instances in the training set (unlike the sequential approach) (Breiman, 1996; Breiman, 1996b; Kuncheva, 2004).

There has been little research that focused on the classification based on Formal Concepts Analysis as part of the parallel learning. We propose to use and study the Formal Concepts Analysis in this context and compare it with respect to the sequential approach. The best-known method, which is based parallel learning is Dagging (Disjoint samples aggregating) creates a number of disjoint groups and stratified data from the original learning data set (Ting & Witten, 1997), each considered as a subset of learning. The weak learner is built on this learning sets. The predictions are then obtained by combining the classifiers outputs by majority vote (Ting & Witten, 1997; Davison & Sardy, 2006). This method has shown its importance in recent work. Then, we propose to use this technique in this work to study the classifier ensembles based on formal concepts, since a limited number of studies have focused on the formal concepts in the context of parallel learning.

First, we present the basics of Formal Concept Analysis and several classification methods based on lattice concept or sub-lattice of concepts. Second, we present ensemble methods based on sequential and parallel learning and we propose a new method exploiting the advantages of the Dagging to generate and combine in a parallel way a weak concept learner. Then, we present classifiers based on FCA and an amelioration of this classifier. A comparative and experimental study is presented to evaluate the performance of concept ensembles based on certain criteria such as the number, variety, and type of classifiers. Finally, the comparative study shows the importance of parallel learning compared to sequential learning.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024)
Volume 12: 2 Issues (2022)
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing