Neighborhood-Based Classification of Imprecise Data

Neighborhood-Based Classification of Imprecise Data

Sampath Sundaram (Kalasalingam Academy of Higher Education and Research, Srivilliputhur, India) and Miriam Kalpana Simon (Madras Christian College, India)
DOI: 10.4018/978-1-7998-0190-0.ch004


This chapter considers k-nn classification of objects that assume values of imprecise nature with respect to attributes being considered. In order to handle imprecise data values, two approaches of crisp conversion of fuzzy data sets are considered. The approaches considered in this chapter are borrowed from Credibility Theory of Liu. A comparative study on the choice of approach for crisp conversion of fuzzy data has been carried out with the help of certain multivariate simulated data sets. Conclusions drawn from the study are presented.
Chapter Preview

1. Introduction

Classification is one of the major branches of data mining. While clustering algorithms attempt to identify the presence of natural subgroups in a data set, classifiers aim at assigning membership for a test set object with one of the pre-identified classes. In the literature various types of approaches are available for classification. Among them neighborhood based methods, decision trees, naïve Bayesian classifiers and support vector machines are widely studied non parametric classifiers. Neighborhood based methods (Fix and Hodges (1951); Cover and Hart, 1967)) like k nearest neighborhood works in a manner that assigns membership to a test set object based on the memberships of the objects which are close to the k nearest neighbors. Here k denotes a pre-determined user specific positive integer. Decision trees which are developed by taking into account various measures of information/impurity leads to the classification of test objects based on the traversal along its branches. Naïve Bayesian classifier which uses class conditional independence producing posterior probabilities can be used for classification purpose. The naïve Bayesian classifier used Bayes theorem. Details related to these two classifiers can be found in Tan, Steinbach and Kumar (2006). Support vector machines (Corinna and Vapnik, 1995) which are based on sound mathematical principles perform extremely well for various types of benchmark datasets. Apart from these classifiers which do not make use of statistical principles, several classifiers like Fisher’s linear discriminant functions, quadratic discriminant functions based on statistical theory (Johnson and Wichern, 2008) are also available in literature.

The classifiers mentioned above are predominantly meant for datasets consisting of crisp values. To be precise, values assumed by the objects (both training set as well as test set) with respect to variables under consideration assume crisp values which are free from any kind of impreciseness (both subjective as well as objective). Even though majority of the classifiers assign a test object into one of the predefined classes based on the trained classifiers in a precise (crisp) manner, researchers have made attempts to design classifiers which give scope for assignment of membership in an imprecise manner. Such classifiers become relevant in real life situations particularly when clear cut assignment of membership with an identified class becomes a difficult task. This type of problem arises in several areas of scientific investigations. For example, based on clinical symptoms and diagnostic tests often a medical practitioner finds it to difficult to classify a patient into one of the possible diseases. Hence, assigning the membership with different groups of people associated with varying ailments using different membership values (which indicate the belongingness of the test patient) appears to be prudent.

Complete Chapter List

Search this Book: