An Uncertainty-Based Model for Optimized Multi-Label Classification

An Uncertainty-Based Model for Optimized Multi-Label Classification

J. Anuradha (VIT University, India) and B. K. Tripathy (VIT University, India)
Copyright: © 2015 |Pages: 34
DOI: 10.4018/978-1-4666-8291-7.ch002
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The data used in the real world applications are uncertain and vague. Several models to handle such data efficiently have been put forth so far. It has been found that the individual models have some strong points and certain weak points. Efforts have been made to combine these models so that the hybrid models will cash upon the strong points of the constituent models. Dubois and Prade in 1990 combined rough set and fuzzy set together to develop two models of which rough fuzzy model is a popular one and is used in many fields to handle uncertainty-based data sets very well. Particle Swarm Optimization (PSO) further combined with the rough fuzzy model is expected to produce optimized solutions. Similarly, multi-label classification in the context of data mining deals with situations where an object or a set of objects can be assigned to multiple classes. In this chapter, the authors present a rough fuzzy PSO algorithm that performs classification of multi-label data sets, and through experimental analysis, its efficiency and superiority has been established.
Chapter Preview
Top

Introduction

In the past few years, although information retrieval has become easier, retrieving relevant information from the repository has become a challenge as the data available in the repositories grow exponentially. Retrieval of meaningful and appropriate data is crucial for qualitative decision making. This can be achieved by an efficient knowledge discovery or data mining tool. Classification, clustering and feature selection are some of the popular mechanisms those help in analyzing data to identify hidden patterns. Classification is a supervised technique that classifies the pattern based on the sample data using standard algorithms. Classification problem will become complex when there exist high possible combination of patterns. Rule generation by tree or other induction techniques face the difficulty in generating unambiguous optimized rule from the complex, vague data. Clustering is a popular unsupervised learning technique that partitions data into clusters of data having similar characteristics. The challenging job of clustering is that, the sharp identification of the dissimilarity and the degree of similarity that helps to classify the data into different groups. The dynamic and adaptable nature of algorithms is important for formation of good clustering. Feature selection is the problem of filtering out the essential data and discarding the irrelevant information from the given inputs. This is an important preprocessing step that the performance of knowledge discovery can be enhanced or reduced. The self-adaptability nature of evolutionary algorithm handles these problems in a simple and easy way to produce the best solution from a large data.

Modern optimization techniques have aroused great interest among the scientific and technical community in a wide variety of fields recently, because of their ability to solve problems with a non-linear and non-convex dependence of design parameters. Several new optimization techniques have emerged in the past two decades that mimic biological evolution, or the way biological entities communicate in nature. The most representative algorithms include Genetic Algorithms (GA), Particle Swarm Optimization (PSO) and the method of Differential Evolution (DE).

By analogy with natural selection and evolution, in classical GA the set of parameters to be optimized (genes) defines an individual or potential solution X (chromosome) and a set of individuals makes up the population, which is evolved by means of the selection, crossover and mutation genetic operators. The optimization process used by the GA follows the next steps. The genetic algorithm generates individuals (amplitude excitations and phase perturbations of the antenna elements). The individuals are encoded in a vector of real numbers, that represents the amplitudes, and a vector of real numbers restrained on the range (0, 2π), that represents the phase perturbations of the antenna elements.

One of the main drawbacks of GA is their lack of memory, which limits the search and convergence ability of the algorithms. In GA, the concept of memory relies on elitism, but there is no stronger operator to propagate accurate solutions in a faster way. However, the PSO algorithm emerges as a powerful stochastic optimization method inspired by the social behavior of organisms such as bird flocking or fish schooling, in which individuals have memory and cooperate to move towards a region containing the global or a near optimal solution. PSO like any other evolutionary algorithm is an optimization technique that performs randomized search in the solution space for an optimal solution by updating generations.

Key Terms in this Chapter

Feature: These are attributes in a data table.

Rough Set: A set in which the uncertainty is captured in the boundary region. It is approximated by a pair of crisp sets called the lower and upper approximation of the set.

Classification: It is a process similar to clustering except that it comes under supervised learning in contrast to clustering, which comes under unsupervised learning approach.

Fuzzy Set: A set in which the belongingness of elements to the set are given by membership functions providing values lying between 0 and 1.

Optimization: It is a process of finding the optimal (maximum or minimum) value of a function called the objective function subject to certain constraints.

Genetic Algorithm: These are the search and optimization algorithms which are capable of searching large solution spaces to find the optimal solutions using the methods of natural selection.

Clustering: A process to divide a set of data into groups called clusters, where the elements inside a group have higher similarity to each other than the similarity between elements of different groups.

Artificial Neuron Network: A network modeled by using artificial neurons in parallel to biological neurons to mimic the human brain.

Complete Chapter List

Search this Book:
Reset