Feature Selection Based on Clonal Selection Algorithm: Evaluation and Application

Feature Selection Based on Clonal Selection Algorithm: Evaluation and Application

Xiangrong Zhang (Xidian University, P.R. China) and Fang Liu (Xidian University, P.R. China)
DOI: 10.4018/978-1-60566-310-4.ch009
OnDemand PDF Download:


The problem of feature selection is fundamental in various tasks like classification, data mining, image processing, conceptual learning, and so on. Feature selection is usually used to achieve the same or better performance using fewer features. It can be considered as an optimization problem and aims to find an optimal feature subset from the available features according to a certain criterion function. Clonal selection algorithm is a good choice in solving an optimization problem. It introduces the mechanisms of affinity maturation, clone, and memorization. Rapid convergence and good global searching capability characterize the performance of the corresponding operations. In this study, the property of rapid convergence to global optimum of clonal selection algorithm is made use of to speed up the searching of the most appropriate feature subset among a huge number of possible feature combinations. Compared with the traditional genetic algorithm-based feature selection, the clonal selection algorithm-based feature selection can find a better feature subset for classification. Experimental results on datasets from UCI learning repository, 16 types of Brodatz textures classification, and synthetic aperture radar (SAR) images classification demonstrated the effectiveness and good performance of the method in applications.
Chapter Preview

1. Intoduction

Feature selection is an active research area in pattern recognition, machine learning, and data mining. In the workshop of NIPS 2003 on feature extraction and feature selection challenge, feature selection is studied extensively. And there is a workshop on feature selection in NIPS 2006. Also, FSDM 2006 is an international workshop on feature selection for data mining. At present, a great deal of research on feature selection has been carried out. Feature selection is defined as the process of choosing a subset of the original predictive variables by eliminating redundant features and those with little or no predictive information. If we extract as much information as possible from a given dataset while using the smallest number of features, we can not only save a great amount of computing time and cost, but also improve the generalization ability to unseen points.

The majority of classification problems require supervised learning where the underlying class probabilities and class-conditional probabilities are unknown, and each instance is associated with a class label. In these situations, relevant features are often unknown a priori. Therefore, many candidate features are introduced to better represent the domain. Unfortunately, many of these are either partially or completely irrelevant to the target concept. Reducing the number of irrelevant features drastically reduces the running time of a learning algorithm and yields more general concept. This helps in getting better insight into the underlying concept of a real-world classification problem (Kohavi, & Sommereld, 1995; Koller, & Sahami, 1994). Feature selection methods try to pick a subset of features that are relevant to the target concept (Blum, & Langley, 1997).

Recently, natural computation algorithms get widely applications in feature selection (Yang, & Honavar, 1998) and synthesis (Li, Bhanu, & Dong, 2005; Lin, & Bhanu, 2005) to improve the performance and reduce the feature dimension as well. Among them, genetic algorithm (GA) is one of the most popularly used in feature selection (Oh, Lee, & Moon, 2004; Raymer, Punch, Goodman, Kuhn, & Jain, 2000; Zio, Baraldi, & Pedroni, 2006). In this chapter, instead of using GA to search for the optimal feature subset for classification, an effective global optimization technique, the clonal selection algorithm (de Castro, & Von Zuben, 1999, 2000, 2002; Du, Jiao, & Wang, 2002) in artificial immune systems (AISs) is applied in feature selection. AISs are proving to be a very general and applicable form of bio-inspired computing. To date, AISs have been applied to various areas (Bezerra, de Castro, & Zuben, 2004; Dasgupta, & Gonzalez, 2002; de Castro, & Timmis, 2002; de Castro, & Zuben, 2002; Forrest, Perelson, Allen, & Cherukuri, 1994; Nicosia, Castiglione, & Motta, 2001; Timmis, & Neal, 2001; Zhang, Tan, & Jiao, 2004) such as machine learning, optimization, bioinformatics, robotic systems, network intrusion detection, fault diagnosis, computer security, data analysis and so on. Clonal selection algorithm was proposed as a computational realization of the clonal selection principle for pattern matching and optimization. It has become perhaps the most popular in the field of AISs. This chapter will investigate the performance of the clonal selection algorithm in the feature selection.

Key Terms in this Chapter

Artificial Immune Systems: Artificial immune systems are adaptive systems inspired by theoretical immunology and observed immune functions, principles and models, which are applied to complex problem domains

Optimization: Find values of the variables that minimize or maximize the objective function while satisfying the constraints.

Pattern Classification: Pattern classification is a sub-topic of machine learning. It is concerned with the automatic discovery of regularities in data through the use of learning algorithms.

Natural Computation: Natural computation is the study of computational systems that are inspired from natural systems, including biological, ecological, physical, chemical, economical and social systems

Feature Selection: Feature selection attempts to select the minimally sized of features without performance loss or even with performance improvement comparing with using all features.

SAR Image Classification: SAR image classification is to use machine learning algorithms to classify the land covers via SAR images.

Clonal Selection: Human immune response relies on the prior formation of an incredibly diverse population of B cells and T cells. The specificity of both the B-cell receptors and T-cell receptors, that is, the epitope to which a given receptor can bind, is created by a remarkable genetic mechanism. Each receptor is created even though the epitope it recognizes may never have been present in the body. If an antigen with that epitope should enter the body, those few lymphocytes able to bind to it will do so. If they also receive a second co-stimulatory signal, they may begin repeated rounds of mitosis. In this way, clones of antigen-specific lymphocytes (B and T) develop providing the basis of the immune response. This phenomenon is called clonal selection

Texture Classification: Texture is a fundamental property of surfaces. Texture classification is one of the four problem domains in the field of texture analysis. The other three are texture segmentation, texture synthesis, and shape from texture. Texture classification process involves two important phases: efficient description of image texture, and learning and recognition

Complete Chapter List

Search this Book: