Article Preview
TopIntroduction
A recognized issue associated with Single-label Multi-class classification, where examples are associated with exactly one element of the set of class labels C, C > 2, is that when the number of class labels |C| increases the effectiveness of the classification tends to diminish. The Ensemble methodology is considered to be one of the most effective strategies to handle the multi-class problem (Bauer & Kohavi, 1999; Dietterich, 2000; Hansen & Salamon, 1990; Jiawei et al., 2011; Opitz & Maclin, 1999; Oza & Tumer, 2008; Quinlan, 1996; Zhou, 2009). The ensemble model is a composite model comprised of a number of learners (classifiers), often referred to as base learners or weak learners, that “collaborate” to obtain a better classification performance than can be obtained from using a single stand-alone model. Classifiers making up an ensemble can be arranged in three main formats: (i) concurrent (Breiman, 1996, 2001; Machov et al., 2006), (ii) sequential (Freund et al., 1999; Wirth &Catlett, 1988), and (iii) hierarchical (Athimethphat & Lerteerawong, 2012; Chen et al., 2004; Kumar et al., 2002; Lei & Govindaraju, 2005; Madzarov et al., 2008). A commonly adopted structure for hierarchical model generation is a binary tree constructed in either a bottom-up or a top-down manner (Beygelzimer et al., 2007; Kumar et al., 2002).
The novel idea presented in this paper is the generation and usage of a hierarchical ensemble classification model that involves arranging the base classifiers in the form of Directed Acyclic Graph (DAG) structure, where each node in the DAG holds a classifier. Nodes near the root of the hierarchy hold classifiers designed to discriminate between groups of class labels while the leaves hold classifiers designed to distinguish between individual class labels. So as to distribute class labels across the DAG nodes the use of a “combination technique” is proposed. One of the most significant advantages of the DAG classification approach, compared to (say) the binary tree approach, is the ability to include a greater number of possible class label combinations at each level. The work presented in this paper considers two alternative DAG structures to support the generation of the desired DAG hierarchical classification approach: (i) rooted DAG, and (ii) non-rooted DAG. The rooted DAG structure is the most straightforward; however, as will become apparent later in this paper, it entails a number of disadvantages in the context of scalability, effectiveness, and efficiency. The proposed non-rooted structure seeks to address the disadvantages of the rooted DAG. The features provided by the non-rooted DAG structure are that: (i) it enables the elimination of the root node where the largest number of class combinations are considered, as well as reducing the overall number of levels in the desired model (depth pruning); and (ii) it enables the application of breadth pruning, reducing the number of classifiers that need to be generated at each DAG level so as to reduce the overall size of the DAG further (note that breadth pruning cannot be applied to the rooted DAG approach because the rooted DAG requires inclusion of all classes combinations). An issue with respect to the non-rooted DAG structure, as the name implies, is the need to determine the “starting node” (the root) from which the classification process is to commence. To this end, it is proposed that a classifier generator, such as Naive Bayes classification, that produces probability values that can be utilized to determine the starting node is used.