Multilabel Classifier Chains Algorithm Based on Maximum Spanning Tree and Directed Acyclic Graph

The classifier chains algorithm is aimed at solving the multilabel classification problem by composing the labels into a randomized label order. The classification effect of this algorithm depends heavily on whether the label order is optimal. To obtain a better label ordering, the authors propose a multilabel classifier chains algorithm based on a maximum spanning tree and a directed acyclic graph. The algorithm first uses Pearson’s correlation coefficient to calculate the correlation between labels and constructs the maximum spanning tree of labels, then calculates the mutual decision difficulty between labels to transform the maximum spanning tree into a directed acyclic graph, and it uses topological ranking to output the optimized label ordering. Finally, the authors use the classifier chains algorithm to train and predict against this label ordering. Experimental comparisons were conducted between the proposed algorithm and other related algorithms on seven datasets, and the proposed algorithm ranked first and second in six evaluation metrics, accounting for 76.2% and 16.7%, respectively. The experimental results demonstrated the effectiveness of the proposed algorithm and affirmed its contribution in exploring and utilizing label-related information.


INTRoDUCTIoN
Unlike the traditional single-label classification problem, the multilabel classification (MLC) problem allows a sample to simultaneously have multiple label categories.(For example, a news article can belong to the topics of both technology and culture.)This ability means multilabel classification problems can reflect many real-world problems.Examples include text classification (Liu et al., 2021;Minaee et al., 2021;Nam et al., 2014), video annotation (Markatopoulou et al., 2018), image annotation (Lanchantin et al., 2021;Zhu et al., 2017), music classification (Tiple et al., 2022), and protein function prediction (Guan et al., 2018).In practical production applications, labeling samples by hand is difficult and expensive.Thus, solving the multilabel classification problem is valuable.
A straightforward solution to MLC is the binary relevance (BR) algorithm (Boutell et al., 2004).It transforms the original multilabel problem into a series of single-label problems.This algorithm, however, although simple and efficient, does not utilize the information brought between the labels and therefore does not obtain better classification results.It is practicable to improve the multilabel classification accuracy by using the information hidden between the labels.Typical approaches include stacked binary relevance (2BR) (Godbole & Sarawagi, 2004), classifier chains (CC) (Read et al., 2011), multilabel k-nearest neighbor (ML-kNN) (Zhang & Zhou, 2007), rank support vector machine (RankSVM) (Elisseeff & Weston, 2001), among others.
The CC algorithm uses labels as additional features to exploit the correlation information between the labels.The specific practice is to select a label ordering, and all the labels are ranked before the target labels are used as additional features to participate in the training and predict the target label to finally obtain a multilabel classifier chains.The key desired outcome of using the CC algorithm is to find the optimal label ordering.If the predecessors of a label are highly correlated to it, then the additional features can help improve the performance of the corresponding classifier.The traditional CC algorithm determines the label ordering randomly, which has low classification performance and low robustness.To solve the this problem, many variant algorithms of the CC algorithm have been proposed such as probabilistic classifier chains (PCC) (Cheng et al., 2010), ensemble classifier chains (ECC) (Rokach, 2010), conditional entropy-based classifier chains (CEbCC) (Jun et al., 2019), and group sensitive classifier chains (GCC) (Huang et al., 2015).These algorithms improve the classification performance of CC algorithms, but the time complexity is high.Also, they mostly consider only the positive relationship between labels and ignore the negative correlation.Another problem to be considered is how to define the backward and forward order of two labels with correlation.
To address problems of the e CC-related algorithms, we propose a multilabel classifier chains algorithm based on a maximum spanning tree and directed acyclic graph (maxSTCC).The contributions of this paper are listed as follows: 1.The Pearson correlation coefficient (Sinhashthita & Jearanaitanakij, 2020) is used to calculate the degree of correlation between the labels, and the absolute value is taken to consider both positive and negative correlations between the labels as correlations between the labels.An undirected weighted graph of labels is constructed, where the vertices represent labels, and the weights of the edges indicate the degree of correlation among connected labels.2. The maximum spanning tree algorithm is used to transform the undirected weighted graph of labels into a maximum spanning tree to maximize the utilization of the correlation information between labels.3. Conditional entropy is used to define the mutual decision difficulty between two connected labels in the maximum spanning tree, and it takes the direction with lower decision difficulty as the dependence direction between the two labels and finally transforms the maximum spanning tree of labels into a directed acyclic graph (DAG).This process solves the problem of how to order two related labels.4. To illustrate the contribution made by the maximum spanning tree, the classifier chains algorithm for constructing DAG based on conditional entropy (CEDAGCC) is proposed as a control algorithm.
The algorithm directly constructs the directed cyclic graph (DCG) of labels by conditional entropy, and it then converts the DCG into DAG of labels by removing the rings in DCG. 5.The labels in the DAG are realized as a label ordering using topological ordering, and the CC algorithm is used to train and predict based on that label ordering.The maxSTCC algorithm is experimentally compared with other related algorithms, and the experimental results show that the algorithm in this study can obtain more stable and excellent label ranking and better classification performance.

We let D x y i n
denote the training data set, which consists of n instances and use and y y y y to denote the feature data and label data of the ith sample ( , ) x y i i , where d and q denote the number of features and the number of labels, respectively.We let 1 2  denote the feature dataset and label dataset and use L l l l q = { , , , } 1 2  to denote the set of q labels.When a training sample ( , ) x y i i is tagged with l j , then y i j , = 1 ; otherwise, y i j , = 0 .

Multilabel Classification
The utilization of information brought by the correlation between labels to improve the classification performance has been a research topic in recent years.According to the level of label correlation considered by multilabel classification algorithms, existing algorithms can be classified into firstorder strategies, second-order strategies, and higher-order strategies (Zhang & Zhou, 2013).
First-order strategies do not consider correlations between labels, they train and predict each label in turns.The BR algorithm and the ML-KNN algorithm are classification first-order strategy algorithms.The BR algorithm treats each label classification problem as a separate single-label problem and trains a classifier for each label using the full feature dataset.The ML-KNN algorithm handles the multilabel classification problem by making a simple improvement to the KNN algorithm.It counts the number of labels included in the k-nearest neighbors for each label independently without considering the dependencies between labels.
Second-order strategies examine correlations between pairs of labels.Multilabel algorithms included in this strategy have improved classification results compared with that of first-order strategies.The calibrated label ranking (CLR) algorithm (Fürnkranz et al., 2008) and the RankSVM algorithm are two second-order strategy algorithms.The CLR algorithm sorts and splits the labels by comparing them in order to deal with multilabel classification problems.The RankSVM algorithm measures the correlation between relevant and irrelevant label pairs by the label ranking loss function and constructs a convex quadratic optimization problem to solve the multilabel classification problem.
Higher-order strategies consider higher-order correlations between labels (e.g., the relevance of a label to all the remaining labels).Multilabel algorithms involved with higher-order strategies obtain the best classification results, but at the same time, the time complexity increases due increased label correlations.The 2BR algorithm and CC algorithm address the problem of the BR algorithm in that it cannot exploit label relevance by using labels as input features for the feature space.Both are higher-order strategy algorithms that can improve the BR algorithm.The BR algorithm considers label relevance by stacking two layers of the BR algorithm, where the predicted labels of the first layer are used as input features to the feature space of the second layer, and the predicted labels of the second layer are used as the final result.The CC algorithm trains the current label classifier by forming a chain of all labels ranked before that label as input features for the feature space to train the classifier.The label-related information is passed through the chain while retaining the low time complexity of the BR algorithm, but the classification effect of the CC algorithm is vulnerable to the label sequence.
The focus of this research is to establish an optimal sequence of label ordering for the improvement of the CC algorithm.The probabilistic classifier chains (PCC) algorithm (Cheng et al., 2010) finds the sequence of labels with the highest confidence by iterating all label orderings.This algorithm, however, can only be used for datasets with a small number of labels due to its high time complexity.The ECC algorithm determines the final prediction by training multiple chains of random classifiers and voting the prediction results of each classifier chains.The CEbCC algorithm first calculates the conditional entropy between labels and then counts the sum of conditional entropy of each label to rank them.The Bayesian chain classifiers (BCC) algorithm (Zaragoza et al., 2011) and the Bayesian network-based label correlation analysis for multilabel classifier chain (BNCC) algorithm (Wang et al., 2021) determine the label ordering by building a Bayesian network of labels.The association rules-based classifier chains method (ARECC) algorithm (Jiaman et al., 2022) ranks labels by mining association rules between them.

CC Algorithm
The main task of multilabel classification is to establish the correspondence from the data feature space to the label space.Supposing h j denotes the mapping of feature data X to the jth label: h X l j j : ® , where l j takes the value 0 or 1, then h j is taken as the binary classifier for the jth label.The classical BR algorithm trains a separate classifier for each label independently for a total of q binary classifiers: h h h q 1 2 , , .Using the label relevance to improve classification performance is the focus of multilabel classification research.To address the problem that the BR algorithm cannot utilize the label relevance, the CC algorithm introduces label relevance by using labels as an additional dimension of features.The specifics are shown in Table 1.
For the 1 £ £ j q label, the following equation describes the training process of the CC algorithm.
Similarly, the following equation describes the predicted label ŷ of x .
PRoPoSeD MeTHoD

Undirected weighted Graph of Labels
The Pearson correlation coefficient is widely used to measure the degree of correlation between two variables and takes on a value between -1 and 1.When the value is 0, the two variables are not correlated at all.When its value is greater than 0, the two variables show positive correlation, and the
A small modification to the Pearson correlation coefficient is used to calculate the degree of correlation between two labels (Tsoumakas et al., 2009).The relevance of labels l j and l k in this paper is defined as follows: where A , B , C and D denote the four combinations of statistics for labels l j and l k , respectively.The calculation formulae are as follows: A y and y

D y and y i j i k i
where • indicates that the entire bracket takes the value of 1 when the condition inside the bracket holds; otherwise, the value is 0.
In order to measure the correlation between labels more comprehensively, both negative and positive correlations between labels are considered as the correlation measure between labels.Then, by calculating the correlation degree between two labels, a label correlation matrix R can be obtained and defined as follows: By calculating the correlation degree between labels a weighted undirected connected graph G=(V, E, W) can be constructed with labels as vertices and label correlations as weighted edges, where the set of vertices is , and . Then the adjacency matrix A of the weighted undirected graph G of labels is The values in the adjacency matrix A are the weights of the corresponding two vertices.For the weighted undirected graph G of labels the adjacency matrix A is a symmetric matrix.

Maximum Spanning Tree of Labels
A connected graph without loops is called a tree, and a spanning tree is a connected spanning subgraph of an undirected connected graph without loops.As an important problem in the graph theory, the spanning tree is widely used in fields such as network optimization, data structure, engineering, and combinatorial optimization.The spanning tree with the largest sum of edge weights among all spanning trees of a graph is the maximum spanning tree.Commonly used maximum spanning tree algorithms include Kruskal's algorithm, Prim's algorithm, and the broken circle method.We use the idea of the Prim algorithm to construct the maximum spanning tree T V E tree tree = ( , ) of labels.The specific process is shown in Algorithm 1.

Directed Acyclic Graph of Labels
The maximum spanning tree T V E tree tree = ( , ) of the labels is obtained as above.The maximum spanning tree of labels is constructed to maximize the consideration of label relevance and thus optimizes the label ordering to improve the classification performance of the classifier chains algorithm.In order to derive the label ordering, the maximum spanning tree of labels is converted into a DAG.The key issue in this process is determining the direction of each edge of the maximum spanning tree.
We first use information entropy to define the uncertainty of the label.The uncertainty of label l L j Î is The uncertainty H l j ( ) of label l j is minimized when all values of label l j are 1 or 0. The uncertainty H l j ( ) of label l j is maximized when half of the values of label l j are 1 and half are 0.
Conditional on the given label l L k Î , the uncertainty of label l j is defined by the conditional entropy as follows: Algorithm 1. Generate maximum spanning tree of labels Î -, and w l l j k ( , ) reaches the maximum weight 5. endwhile

H l l p y H l y p y p y y
From the above equation, we obtain the following properties of the label conditional uncertainty indicates the size of the information carried by l k to l j .The larger the value is, the more the carried information is.

When H l l j k
( | ) takes the minimum value, that label l k can completely predict the value of label l j .

When H l l j k
( | ) takes the maximum value, that label l k has no contribution on predicting the value of label l j .

H l l H l l
indicates that the conditional entropy is asymmetric.There is a difference between the uncertainty of l j given l k and the uncertainty of l k given l j .
Based on the nature of the analysis H l l

,
(directed edges from label l k to label l j ).I l l k j is defined as follows: The maximum spanning tree is a connected graph without loops.The maximum spanning tree can be transformed into a DAG by determining the direction of each edge.For the edge ( , ) l l j k in the maximum spanning tree, we calculate I l l k j ( ) ® and I l l j k ( ) ® , then compare them and define the direction of the edge using the direction with less decision difficulty.The direction of each edge in the maximum spanning tree is determined, and finally, the maximum spanning tree is transformed into a DAG.The specific process is shown in Algorithm 2.

Topological Sorting
We have obtained a DAG of labels in which two connected labels have an anterior-posterior ordering.To obtain the final label ordering, we use the topological ordering, which provides an efficient solution for the output vertices of the DAG.Topological ordering is commonly used to solve engineering advancement problems in AOV nets, where the tasks that are ranked first are the ones that need to be completed first.In this study, in the DAG of labels, it is necessary to place the labels that have less difficulty (i.e., a greater degree of influence) on the target label decision ahead of the target label so that the label information can be delivered correctly along the label ordering.The specific algorithm is as follows:

Time Complexity Analysis of optimal Label ordering
In order to optimize the label ordering of the classifier chains algorithm, we construct the relevance matrix of labels, the maximum spanning tree of labels, and the DAG of labels, respectively.And the outputs are the optimized label ordering through topological sorting.Since the Pearson correlation coefficient is symmetric, the time complexity of constructing the relevance matrix of labels is O q ( / ) 2 2 .The time complexity of constructing the maximum spanning tree of labels using Prim is O q ( ) 2 .The time complexity of transforming the maximum spanning tree of labels into the DAG of labels is O q ( ) .The time complexity of using topological sorting to output the optimized label ordering is O q e ( ) + , where e is the number of edges.The time complexity of the optimized label ordering is O q q e ( / ) 3 2 2 2 + + .

Control experimental Algorithm
To investigate whether building a maximum spanning tree of labels can effectively optimize label ordering and improve the final classification performance, we designed a classifier chains algorithm based on conditional entropy to construct a directed acyclic graph (CEDAGCC) as a control algorithm, which directly constructs a DAG of labels based on the mutual decision difficulty between labels.By calculating the mutual decision difficulty I l l has a total of q q ( ) -1 directed edges and q is the number of labels.To obtain the label ordering, the DCG of labels is converted to DAG.According to the above analysis, the smaller I l l j k ( ) ® is the greater the effect of l j on l k is.Therefore, the DCG can be transformed into DAG by removing the edge in each ring of the DCG that has the greatest decision difficulty (i.e., least influence).Algorithm 4 illustrates the process of converting the DCG of labels into DAG.
The linear time complexity of this algorithm to disconnect Cyc is O Cyc (| |) , where | | Cyc denotes the total number of edges in Cyc .After obtaining the DAG of labels, the labels in the DAG are outputs as label ordering using Algorithm 3, and finally, this label ordering is trained and predicted using the CC algorithm.

Datasets
To verify the effectiveness of the algorithm proposed in this paper, seven datasets were selected from the publicly available multilabel dataset Mulan (Tsoumakas et al., 2011).Mulan is a Java library for learning from multilabel data and is widely used to test the performance of multilabel classifiers.: , , ,

endfor
These seven datasets are related to several domains including music, image, bioinformation, and text.Basic statistical information of the selected datasets is shown in Table 2.
Cardinality indicates the average number of labels in the sample.The calculation formula is as follows:

evaluation Metrics and Comparable Algorithms
Since each sample has multiple labels at the same time in multilabel classification, the common singlelabel evaluation metrics cannot fully and accurately evaluate the results of multilabel classification.In order to measure the advantages and disadvantages of multilabel classification algorithms we use six evaluation metrics that are widely used in multilabel classification.
Jaccard Similarity is used to compare the similarity and difference between sample sets and to evaluate the average proportion of correctly predicted labels to all labels in each sample, requiring that the predicted label sequence and the actual label sequence are identical.

Exact Match n y y
When all labels of a sample are correctly predicted the sample is correctly predicted.Exact Match indicates the percentage of correct sample prediction, and a higher value indicates a better classification.
F1 indicates the composite index of classification effectiveness, which is the summed average of precision and recall of samples on the label.A higher F1 indicates better classification effectiveness.macroF q p r p r j j j j j q 1 1 2 The macro F1 score represents the weighted average of precision and recall for all labels.Higher scores indicate that the algorithm performs well on low-frequency labels.

microF y y y y i j i j j
The micro F1 focuses on the prediction of each label and is affected by false negatives and false positives.It represents the mean value of the weighted sum of precision and recall under all labels.Higher scores indicate better performance of the algorithm on high frequency labels.

Ranking Loss n Y Y y y r y r y y y Y
where Y i is the complementary set of Y i with respect to the set of labels L and r i denotes the ranking function.Ranking loss (Tsoumakas et al., 2011) examines the number of times when the irrelevant labels are ranked higher than the relevant ones.The smaller the ranking loss is, the higher the probability of correct ranking and the better the classification model will be.
The six evaluation metrics above measure the classification results from different perspectives.The larger the value of the first five evaluation metrics, the better the classification performance of the algorithm.The last evaluation metric, ranking loss, has a smaller value, which means that the algorithm has better classification performance.
Five related algorithms were selected for comparison with the two experimental algorithms proposed in this paper.The details are as follows: 1. BR algorithm: as a classical first-order policy algorithm, it trains a classifier for each label independently, without considering the relationship between labels.2. 2BR algorithm: this algorithm is a stacked structure algorithm that uses the predicted labels of the first layer as extended features of the features in the second layer to exploit label correlation.3. CC algorithm: classifier chains algorithm.4. ECC algorithm: the algorithm improves the classification effect by training multiple classifier chains.In this paper, we uniformly set the number of classifier chains trained for each dataset to 5. 5. CEbCC algorithm: the algorithm obtains the label ordering by counting the conditional entropy between the labels and then by statistical means.6. CEDAGCC algorithm: a controlled experimental algorithm is proposed in this paper to illustrate whether constructing a maximum spanning tree of labels can effectively utilize label relevance information.7. maxSTCC algorithm: the proposed algorithm in this paper.The algorithm optimizes label ordering by constructing a maximum spanning tree and a directed acyclic graph.
We chose the BR algorithm, 2BR algorithm, and CC algorithm as comparison algorithms because both 2BR algorithm and CC algorithm use labels as extra features of features to solve the BR algorithm's lack of ability to utilize label correlation information.The 2BR algorithm uses all labels as extra features of features to utilize label correlation, and the CC algorithm only uses labels ranked before the target labels as extra features of features to utilize label correlation.The ECC algorithm and CEbCC algorithm, along with the CEDAGCC algorithm and maxSTCC algorithm proposed in this paper, both improve the classification performance by optimizing the label ordering of the CC algorithm.

experiment Setup
The experimental dataset is randomly disrupted and divided into five equal parts, four of which are selected as the training dataset, and the remaining one as the test data set.Then the experiment is conducted using fivefold cross-validation, and the mean value of the five experiments is counted as the result of one experiment.
Since the base classifier trained by all algorithms is binary, the algorithm in this study and the comparison algorithm uniformly use a linear kernel-based support vector machine (SVM) as the base classifier (Sun et al., 2014;Tsoumakas et al., 2010;Vapnik, 1996;Wang et al., 2013).The penalty factor C in SVM is a key parameter that affects its performance.When C is large it may lead to overfitting, and when C is small it may lead to underfitting.CAL500 and birds datasets were selected as experimental subjects to analyze the effect of penalty coefficient C values on the experiments.The effect of the C value on the CC algorithm is studied by adjusting the C value to vary in the range of [1E-1, 3E-2, 1E-2, 1E-3, 3E-4, 1E-4, 3E-5, 1E-5, 3E-6, 1E-6].The proposed algorithm and the comparison algorithm in this paper both use the CC algorithm as the base.All the algorithms related to CC are optimized for label ordering, so studying the impact of C value on the CC algorithm is of generality.Figures 1 and 2 show the experimental results.
In Figures 1 and 2, macro F1, which can evaluate the multilabel classification results more comprehensively, is selected as the evaluation metric to test the effect of penalty coefficient C in the classifier SVM on the CC algorithm.In the bird's data set in Figure 1, the macro F1 evaluation metric achieves the maximum value when the C value is taken as 1E-1, indicating the best classification effect, and the minimum value when the C value is taken as 1E-4, indicating the worst classification effect.In the CAL500 dataset in Figure 2, the maximum value of the macro F1 evaluation index is obtained when the C value is 3E-2, which indicates the best classification effect, and the minimum value of macro F1 evaluation index is obtained when the C value is 1E-6, which indicates the worst classification effect.
Observing Figures 1 and 2, the value of penalty coefficient C directly affects the classification results of the CC algorithm, and the C values to achieve the best classification performance are different in different datasets.In order to make each algorithm obtain the best classification performance, the C values are adjusted in the range of [1E-1, 3E-2, 1E-2, 1E-3, 3E-4, 1E-4, 3E-5, 1E-5, 3E-6, 1E-6] in the experiments, and two-fold cross-validation is performed on the training set to select the C values that obtain the best validation performance.
To avoid the random effect in the experiment, we conducted 10 repetitions of the experiment for each algorithm on each dataset, and took the mean and standard deviation of each metric as the final result of the experiment.The experimental hardware and software facilities are Intel Core i7 4790 for the central processor, 8G of memory, and 64-bit Windows 10 for the operating system.All experiments presented in this paper were developed using the python language with the help of the toolkit provided by the scikit-learn platform.

ReSULTS AND DISCUSSIoN
Tables 3 to 8 below show the evaluation results and standard deviations of different evaluation metrics for the algorithms used in this study and the comparison algorithms on seven publicly available datasets.Where ↑ indicates that the larger the evaluation index, the better the classification effect; ↓ indicates that the smaller the value of the evaluation index, the better the classification effect.The bold in the table indicates the best evaluation result, and the numbers in small brackets indicate the ranking of the algorithms with the same evaluation criteria in the same dataset.
As seen from Tables 3 to 8, the algorithm maxSTCC achieved relatively good performance over all datasets.Among the evaluation results in the seven datasets, the maxSTCC algorithm ranked first and second with 76.2% and 16.7%, respectively.
The algorithm maxSTCC achieves optimal results on six datasets and suboptimal results on the Enron dataset in Table 3. Table 3 illustrates that the algorithms are able to maximize the prediction of the correct label category for each label.In the CAL500 dataset in Table 4, the exact match evaluation metric for all algorithms was 0, indicating that none of the samples were correctly predicted.And in the remaining six datasets, the algorithms obtained suboptimal results only on the yeast dataset.From Tables 3 and 4, the algorithms are shown to improve the accuracy of label prediction as well as the correct prediction rate of samples.In Tables 5 to 7, the maxSTCC algorithm achieves good performance on the comprehensive evaluation metrics F1, macro F,1 and micro F1 and achieves suboptimal performance on some labels only.In Table 8, the algorithm maxSTCC algorithm also achieves better results on the ranking loss evaluation index.From Tables 3 to 8, the 2BR algorithm, CC algorithm, ECC algorithm, CEbCC algorithm, and the CEDAGCC algorithm and the maxSTCC algorithm, which consider label correlation, improve the classification results compared with the BR algorithm, which does not consider label correlation at all.This indicates that using label correlation can improve the classification results of multilabel classification algorithms.The 2BR algorithm uses the predicted labels of the first layer as input features for the second layer features to exploit label correlations.If the first layer classifier predicts incorrect labels, it may introduce incorrect label correlations in the second layer, thus training a second layer classifier with poor performance and ultimately leading to poor classification results.It is observed from the following tables that the 2BR algorithm is only superior to the BR algorithm as a whole.The   where c F 2 is calculated as   9.

Stability Analysis of the maxSTCC Algorithm
In terms of algorithm stability, the classifier chains and its improvement algorithms are prone to unstable performance when the label dimension of the dataset is too high.The maxSTCC algorithm differs from the CC algorithm in randomly selecting the label ordering, but it explores the dependency information among labels by constructing the maximum spanning tree and DAG of labels and then obtains a more stable label ordering.Thus, it has a more stable classification performance.
To effectively illustrate the stability of the maxSTCC algorithm, two datasets, CAL500 and bibtex, which have 174 and 159 labels, respectively, were selected for observation.As shown in Tables 3  to 8, the stability (standard deviation) of CEDAGCC algorithm and maxSTCC algorithm proposed in this paper achieves better results compared with the CC algorithm, ECC algorithm, and CEbCC algorithm.In particular, the maxSTCC algorithm maximizes the utilization of correlation information among labels by constructing the maximum spanning tree of labels, which can further improve the stability of label ordering compared with the CEDAGCC algorithm.Therefore, the performance of maxSTCC algorithm is more stable.

CoNCLUSIoN AND FUTURe woRK
In this paper, we propose a new multilabel classification algorithm (maxSTCC), which improves the classification performance of the CC algorithm by building a maximum spanning tree of labels and transforming it into a directed acyclic graph to obtain a better label ordering.The maxSTCC algorithm has the following main contributions: 1) using the Pearson correlation coefficient to measure the degree of correlation between labels and taking the absolute value to consider the positive correlation and negative correlation, 2) constructing a maximum spanning tree of labels to maximize the utilization of the correlation information between labels, and 3) using conditional entropy to define the mutual decision difficulty between two related labels and using the direction with less decision difficulty as the dependency direction of these two labels to solve the ranking problem between two related labels.
, it is known that H l l j k ( | ) represents the level of decision difficulty for label l j given label l k and also reflects the degree of independence of l j from l k .Accordingly, for a set of labels l l

,
labels, l j and l k , a directed cyclic graph (DCG) of the labels can be obtained.There are two types of links between each pair of labels, l j and l k .Their weights are I l the difficulty of mutual decision between l j and l k .This shows that the DCG Algorithm 2. Transformation of maximum spanning tree into DAG Input: Maximum spanning tree T=(V

Figure 1 .
Figure 1.Effect of C-Value in CAL500 data on the CC algorithm

Figure 3 .Figure
Figure 3. Average ranking of seven algorithms on six metrics