Effective Multi-Label Classification Using Data Preprocessing

Effective Multi-Label Classification Using Data Preprocessing

Vaishali S. Tidake, Shirish S. Sane
DOI: 10.4018/978-1-7998-7371-6.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Usage of feature similarity is expected when the nearest neighbors are to be explored. Examples in multi-label datasets are associated with multiple labels. Hence, the use of label dissimilarity accompanied by feature similarity may reveal better neighbors. Information extracted from such neighbors is explored by devised MLFLD and MLFLD-MAXP algorithms. Among three distance metrics used for computation of label dissimilarity, Hamming distance has shown the most improved performance and hence used for further evaluation. The performance of implemented algorithms is compared with the state-of-the-art MLkNN algorithm. They showed an improvement for some datasets only. This chapter introduces parameters MLE and skew. MLE, skew, along with outlier parameter help to analyze multi-label and imbalanced nature of datasets. Investigation of datasets for various parameters and experimentation explored the need for data preprocessing for removing outliers. It revealed an improvement in the performance of implemented algorithms for all measures, and effectiveness is empirically validated.
Chapter Preview
Top

Background

The related work about multi-label classification and label imbalance is presented here. For multi-label classification, there exist methods that use the transformation approach. It changes multi-label data such that methods for single-label classification can be used. Sometimes multi-label data is not modified. Thus adaptation methods modify the process of dealing with such data. There also exists an approach that ensembles multiple existing methods. CC (Read, 2009), MLkNN (Zhang & Zhou, 2007) and RAkEL (Tsoumakas et al., 2011) are examples of these three approaches respectively.

For few decades, many researchers have worked in the field of multi-label classification (Tsoumakas & Katakis, 2007) (Tsoumakas et al., 2009) (Trohidis et al., 2008) (Tsoumakas et al., 2010) (Madjarov et al., 2012) (Zhang & Zhou, 2014) (Tidake & Sane, 2018). K nearest neighbor has also been the choice of many researchers for multi-label classification. From the study, it is noticed that neighbors are obtained using only features always. In contrast, the scenario is different for data that is multi-label. Each instance belongs to a predefined set of labels. Hence it is possible to consider labels along with features for obtaining neighbors.

Zhang and Zhou discuss an approach in (Zhang & Zhou, 2007). It follows an algorithm adaptation approach. It is an improved version of the well-known nearest neighbor algorithm. Several researchers use it to perform multi-label classification. It utilizes feature similarity to determine nearest neighbors (Zhang & Zhou, 2005) (Zhang & Zhou, 2007) (Spyromitros-Xioufis et al., 2008). In the case of multi-label classification, since the instances are associated with multiple labels, label dissimilarity may also help determine a set of nearest neighbors.

Complete Chapter List

Search this Book:
Reset