Evolutionary Approach to Dimensionality Reduction

Evolutionary Approach to Dimensionality Reduction

Amit Saxena (Guru Ghasida University, Bilaspur, India), Megha Kothari (St. Peter’s University, Chennai, India) and Navneet Pandey (Indian Institute of Technology, Delhi, India)
Copyright: © 2009 |Pages: 7
DOI: 10.4018/978-1-60566-010-3.ch125
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Excess of data due to different voluminous storage and online devices has become a bottleneck to seek meaningful information therein and we are information wise rich but knowledge wise poor. One of the major problems in extracting knowledge from large databases is the size of dimension i.e. number of features, of databases. More often than not, it is observed that some features do not affect the performance of a classifier. There could be features that are derogatory in nature and degrade the performance of classifiers used subsequently for dimensionality reduction (DR). Thus one can have redundant features, bad features and highly correlated features. Removing such features not only improves the performance of the system but also makes the learning task much simpler. Data mining as a multidisciplinary joint effort from databases, machine learning, and statistics, is championing in turning mountains of data into nuggets (Mitra, Murthy, & Pal, 2002).
Chapter Preview
Top

Introduction

Excess of data due to different voluminous storage and online devices has become a bottleneck to seek meaningful information therein and we are information wise rich but knowledge wise poor. One of the major problems in extracting knowledge from large databases is the size of dimension i.e. number of features, of databases. More often than not, it is observed that some features do not affect the performance of a classifier. There could be features that are derogatory in nature and degrade the performance of classifiers used subsequently for dimensionality reduction (DR). Thus one can have redundant features, bad features and highly correlated features. Removing such features not only improves the performance of the system but also makes the learning task much simpler. Data mining as a multidisciplinary joint effort from databases, machine learning, and statistics, is championing in turning mountains of data into nuggets (Mitra, Murthy, & Pal, 2002)

Feature Analysis

DR is achieved through feature analysis which includes feature selection (FS) and feature extraction (FE). The term FS refers to selecting the best subset of the input feature set whereas creating new features based on transformation or combination of the original feature set is called FE. FS and FE can be achieved using supervised and unsupervised approaches. In a supervised approach, class label of each data pattern is given and the process of selection will use this knowledge for determining the accuracy of classification whereas in unsupervised FS, class level is not given and process will apply natural clustering of the data sets.

Top

Background

Feature Selection (FS)

The main task of FS is to select the most discriminatory features from original feature set to lower the dimension of pattern space in terms of internal information of feature samples. Ho (Ho, 1998) combined and constructed multiple classifiers using randomly selected features which can achieve better performance in classification than using the complete set of features. The only way to guarantee the selection of an optimal feature vector is an exhaustive search of all possible subset of features (Zhang, Verma, & Kumar, 2005).

Complete Chapter List

Search this Book:
Reset