Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

Inaki Inza (University of the Basque Country, Spain), Pedro Larranaga (University of the Basque Country, Spain) and Basilio Sierra (University of the Basque Country, Spain)
Copyright: © 2002 |Pages: 20
DOI: 10.4018/978-1-930708-25-9.ch005
OnDemand PDF Download:
$37.50

Abstract

Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic Algorithms (GAs) are possibly the most commonly used algorithms for Feature Subset Selection tasks. Although the FSS literature contains many papers, few of them tackle the task of FSS in domains with more than 50 features. In this chapter we present a novel search heuristic paradigm, called Estimation of Distribution Algorithms (EDAs), as an alternative to GAs, to perform a population-based and randomized search in datasets of a large dimensionality. The EDA paradigm avoids the use of genetic crossover and mutation operators to evolve the populations. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search and the subsequent simulation of this distribution to obtain a new pool of solutions. In this chapter we present four different probabilistic models to perform this factorization. In a comparison with two types of GAs in natural and artificial datasets of a large dimensionality, EDAbased approaches obtain encouraging results with regard to accuracy, and a fewer number of evaluations were needed than used in genetic approaches.

Complete Chapter List

Search this Book:
Reset
Table of Contents
Acknowledgments
Hussein A. Abbass, Ruhul Sarker, Charles S. Newton
Chapter 1
Vladimir Estivill-Castro, Michael Houle
Distance-based clustering results in optimization problems that typically are NP-hard or NP-complete and for which only approximate solutions are... Sample PDF
Approximating Proximity to Fast and Robust Distance-Based Clustering
$37.50
Chapter 2
Erick Cantu-Paz
With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace.... Sample PDF
On the Use of Evolutionary Algorithms in Data Mining
$37.50
Chapter 3
Beatriz de la Iglesia, Victor J. Rayward-Smith
Knowledge Discovery in Databases (KDD) is an iterative and interactive process involving many steps (Debuse, de la Iglesia, Howard & Rayward-Smith... Sample PDF
The Discovery of Interesting Nuggets Using Heuristic Techniques
$37.50
Chapter 4
Jay T. Rodstein, Katherine S. Watters
Safety and health issues in virtual offices are part of progressive telecommuting programs. Telecommuting agreements between employers and employees... Sample PDF
From Evolution to Immune to Swarm to? A Simple Introduction to Modern Heuristics
$37.50
Chapter 5
Inaki Inza, Pedro Larranaga, Basilio Sierra
Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic... Sample PDF
Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains
$37.50
Chapter 6
Jorge Muruzabal
Evolutionary algorithms are by now well-known and appreciated in a number of disciplines including the emerging field of data mining. In the last... Sample PDF
Towards the Cross-Fertilization of Multiple Heuristics: Evolving Teams of Local Bayesian Learners
$37.50
Chapter 7
Neil Dunstan, Michael de Raadt
Sensing devices are commonly used for the detection and classification of subsurface objects, particularly for the purpose of eradicating Unexploded... Sample PDF
Evolution of Spatial Data Templates for Object Classification
$37.50
Chapter 8
Peter W.H. Smith
Genetic Programming (GP) has increasingly been used as a data-mining tool. For example, it has successfully been used for decision tree induction... Sample PDF
Genetic Programming as a Data-Mining Tool
$37.50
Chapter 9
Andries P. Engelbrecht, L. Schoeman, Sonja Rouwhorst
Genetic programming has recently been used successfully to extract knowledge in the form of IF-THEN rules. For these genetic programming approaches... Sample PDF
A Building Block Approach to Genetic Programming for Rule Discovery
$37.50
Chapter 10
Rafael S. Parpinelli, Heitor S. Lopes, Alex A. Freitas
This work proposes an algorithm for rule discovery called Ant-Miner (Ant Colony-Based Data Miner). The goal of Ant-Miner is to extract... Sample PDF
An Ant Colony Algorithm for Classification Rule Discovery
$37.50
Chapter 11
Jonathan Timmis, Thomas Knight
The immune system is highly distributed, highly adaptive, self-organising in nature, maintains a memory of past encounters and has the ability to... Sample PDF
Artificial Immune Systems: Using the Immune System as Inspiration for Data Mining
$37.50
Chapter 12
Leandro Nunes de Castro, Fernando J. Von Zuben
This chapter shows that some of the basic aspects of the natural immune system discussed in the previous chapter can be used to propose a novel... Sample PDF
aiNet: An Artificial Immune Network for Data Analysis
$37.50
Chapter 13
Parallel Data Mining  (pages 261-289)
David Taniar, J. Wenny Rahayu
Data mining refers to a process on nontrivial extraction of implicit, previously unknown and potential useful information (such as knowledge rules... Sample PDF
Parallel Data Mining
$37.50
About the Authors