Feature Selection of Interval Valued Data Through Interval K-Means Clustering

Feature Selection of Interval Valued Data Through Interval K-Means Clustering

D. S. Guru (Department of Studies in Computer Science, University of Mysore, Mysore, India), N. Vinay Kumar (Department of Studies in Computer Science, University of Mysore, Mysore, India) and Mahamad Suhil (Department of Studies in Computer Science, University of Mysore, Mysore, India)
Copyright: © 2017 |Pages: 17
DOI: 10.4018/IJCVIP.2017040105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This paper introduces a novel feature selection model for supervised interval valued data based on interval K-Means clustering. The proposed model explores two kinds of feature selection through feature clustering viz., class independent feature selection and class dependent feature selection. The former one clusters the features spread across all the samples belonging to all the classes, whereas the latter one clusters the features spread across only the samples belonging to the respective classes. Both feature selection models are demonstrated to explore the generosity of clustering in selecting the interval valued features. For clustering, the kernel of the K-means clustering has been altered to operate on interval valued data. For experimentation purpose four standard benchmarking datasets and three symbolic classifiers have been used. To corroborate the effectiveness of the proposed model, a comparative analysis against the state-of-the-art models is given and results show the superiority of the proposed model.
Article Preview

Introduction

In the current era of digital technology- pattern recognition plays a vital role in the development of cognition based systems. These systems quite naturally handle a huge amount of data. While handling such vast amount of data, the task of data processing has become curse to process. To overcome curse in data processing, the concept of feature selection is being adopted by researchers. Nowadays, feature selection has become a very demanding topic in the field of machine learning and pattern recognition, as it select the most relevant and non-redundant feature subset from a given set of features using a feature selection technique. Basically, the feature selection techniques are broadly classified into: filter, wrapper, and embedded methods (Artur et. al., 2012).

Generally, the existing conventional feature selection methods (Artur et. al., 2012) fail to perform feature selection on unconventional data like interval, multi-valued, modal, and categorical data. These data are also called in general symbolic data. The notion of symbolic data was emerged in the early 2000, which mainly concentrates in handling very realistic type of data for effective classification, clustering, and even regression for that matter (Lynne and Edwin, 2007). As it is a powerful tool in solving problems in a natural way, we thought of developing a feature selection model for any one of the modalities. In this regard, we have chosen with an interval valued data, due its strong nature in preserving the continuous streaming data in discrete form (Lynne and Edwin, 2007). Thus, we built a feature selection model for interval valued data in this work.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing