Article Preview
TopIntroduction
Unlike the traditional single-label classification problem, the multilabel classification (MLC) problem allows a sample to simultaneously have multiple label categories. (For example, a news article can belong to the topics of both technology and culture.) This ability means multilabel classification problems can reflect many real-world problems. Examples include text classification (Liu et al., 2021; Minaee et al., 2021; Nam et al., 2014), video annotation (Markatopoulou et al., 2018), image annotation (Lanchantin et al., 2021; Zhu et al., 2017), music classification (Tiple et al., 2022), and protein function prediction (Guan et al., 2018). In practical production applications, labeling samples by hand is difficult and expensive. Thus, solving the multilabel classification problem is valuable.
A straightforward solution to MLC is the binary relevance (BR) algorithm (Boutell et al., 2004). It transforms the original multilabel problem into a series of single-label problems. This algorithm, however, although simple and efficient, does not utilize the information brought between the labels and therefore does not obtain better classification results. It is practicable to improve the multilabel classification accuracy by using the information hidden between the labels. Typical approaches include stacked binary relevance (2BR) (Godbole & Sarawagi, 2004), classifier chains (CC) (Read et al., 2011), multilabel k-nearest neighbor (ML-kNN) (Zhang & Zhou, 2007), rank support vector machine (RankSVM) (Elisseeff & Weston, 2001), among others.
The CC algorithm uses labels as additional features to exploit the correlation information between the labels. The specific practice is to select a label ordering, and all the labels are ranked before the target labels are used as additional features to participate in the training and predict the target label to finally obtain a multilabel classifier chains. The key desired outcome of using the CC algorithm is to find the optimal label ordering. If the predecessors of a label are highly correlated to it, then the additional features can help improve the performance of the corresponding classifier. The traditional CC algorithm determines the label ordering randomly, which has low classification performance and low robustness. To solve the this problem, many variant algorithms of the CC algorithm have been proposed such as probabilistic classifier chains (PCC) (Cheng et al., 2010), ensemble classifier chains (ECC) (Rokach, 2010), conditional entropy-based classifier chains (CEbCC) (Jun et al., 2019), and group sensitive classifier chains (GCC) (Huang et al., 2015). These algorithms improve the classification performance of CC algorithms, but the time complexity is high. Also, they mostly consider only the positive relationship between labels and ignore the negative correlation. Another problem to be considered is how to define the backward and forward order of two labels with correlation.
To address problems of the e CC-related algorithms, we propose a multilabel classifier chains algorithm based on a maximum spanning tree and directed acyclic graph (maxSTCC). The contributions of this paper are listed as follows: