An Intuitionistic Fuzzy Approach With Rough Entropy Measure to Detect Outliers in Two Universal Sets

An Intuitionistic Fuzzy Approach With Rough Entropy Measure to Detect Outliers in Two Universal Sets

Sangeetha T. (School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India) and Geetha Mary A. (School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India)
Copyright: © 2020 |Pages: 18
DOI: 10.4018/IJFSA.2020070105

Abstract

The process of recognizing patterns, collecting knowledge from massive databases is called data mining. An object which does not obey and deviates from other objects by their characteristics or behavior are known as outliers. Research works carried out so far on outlier detection were focused only on numerical data, categorical data, and in single universal sets. The main goal of this article is to detect outliers significant in two universal sets by applying the intuitionistic fuzzy cut relationship based on membership and non-membership values. The proposed method, weighted density outlier detection, is based on rough entropy, and is employed to detect outliers. Since it is unsupervised, without considering class labels of decision attributes, weighted density values for all conditional attributes and objects are calculated to detect outliers. For experimental analysis, the Iris dataset from the UCI repository is taken to detect outliers, and comparisons have been made with existing algorithms to prove its efficiency.
Article Preview
Top

Introduction

Data representation may be in the form of text, numbers, images, or pictures which can be easily recognized and handled by a system. It may be in a different format, dimensions, and styles. A person who collects data should be known its purpose, way to handle, and techniques to mine. Machine learning algorithms are used to generate patterns from large databases. (Arning, Agrawal & Raghavan, 1996). The main objective of data mining is to convert extracted patterns to well-known structures. In major, it involves two steps, such as pre-processing and post-processing. The pre-processing stage includes an analysis of raw data, its complexity, and metrics. Also, it helps in identifying missing data, null values, and noise. Post-processing brought data visualization in terms of statistics. It is also merely termed as the discovery of knowledge in databases. (Aggarwal & Yu, 2001)

Among the data mining tasks, such as outliers, regression, association, classification, and clustering, this article provides a detailed analysis of anomaly detection. Objects which slightly deviates from other objects by their behavior or characteristics are outliers (Chandola, Banerjee & Kumar, 2011). Clustering and outliers are highly related tasks. Clustering technique group objects based on proximity relation, whereas outliers are identified based on the objects which have a significant deviation from others (Zhang, Ramakrishnan & Linvy, 1996).

Several techniques are used to detect outliers. Some of the outlier detection methods are statistical analysis, distance based, and density-based measures. The analytical process uses distribution or probability model for the data, distance-based technique to identify outliers from the cluster which are far away (Breunig, Kriegel & Sander, 2000) and density-based measure detect outliers in the local region which have a significant deviation from other objects.

Today, data collection is voluminous, so analysis and size reduction of the dataset will be a tedious task. Manually it is impossible, so we need the assistance of software to do it. Data available are not well defined, and based on customer requirements. To formulate them, we have to manipulate available data into the necessary format (Mary, Acharjya & Iyengar, 2014). Not all data are perfect. Incomplete, inconsistent, vague data also exist in the real world. With existing tools, the problem arises when data are crisp or deterministic can be solved. These tools failed when data are vague, uncertain, and precise.

A well-known mathematical tool rough set is used to analyze the uncertainty and vagueness of data. It identifies indiscernible elements within a set. Sometimes the relation extends to other universes, also such as two universal sets such as C and D, which also employs lower and upper approximation (Liu, 2010). Every element of a set in the universe holds some information. Attributes are used to define the object’s behavior and characteristics. The rough set theory follows equivalence relation, which cannot be suitable in all the environment, which can be enhanced by generalizing it into a fuzzy environment.

The intuitionistic fuzzy set provides solutions for two universal set decision-making system and to achieve optimal solutions with precision and rough degree. For the two nonempty universal sets C and D,intuitionistic fuzzy cut relation provides binary relation. Usually, fuzzy sets deal with interval-valued rather than real numbers. However, a notion given for intuitionistic fuzzy with interval values is known as interval based intuitionistic fuzzy sets which are used to take a decision when there exist multiple attributes (Ren, Chen, Fei & Li, 2017). Researchers developed non-linear models, to decide from various attributes by measuring them with weighted Euclidean distance and its coefficient, which are relatively closeness (Li, 2010).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 9: 4 Issues (2020): 3 Released, 1 Forthcoming
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 4 Issues (2017)
Volume 5: 4 Issues (2016)
Volume 4: 4 Issues (2015)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing