Evolutionary Algorithm for Identifying Predictive Genes

Evolutionary Algorithm for Identifying Predictive Genes

DOI: 10.4018/978-1-60960-557-5.ch012
OnDemand PDF Download:
No Current Special Offers

Chapter Preview


Evolutionary Search For Optimal Or Near-Optimal Set Of Genes

The overview of different feature selection methods would not be complete without mentioning evolutionary methods built on the principles of biological evolution involving natural selection and survival of the fittest. Evolutionary techniques perform stochastic search and optimization in the solution space. In contrast to many traditional optimization techniques, they involve a parallel search through the population of potential solutions.

In this chapter, we consider the evolutionary algorithm proposed in a number of articles (Deutsch J. M., 2001), (Deutsch J. M., 2003), (Jirapech-Umpai & Aitken, 2005). The algorithm in (Jirapech-Umpai & Aitken, 2005) is a modification of the statistical replication algorithm in (Deutsch J. M., 2001), (Deutsch J. M., 2003). Hence, we base our discussion on the former since it was the latest to appear.

The algorithm from (Jirapech-Umpai & Aitken, 2005) addresses the multi-class classification of microarray data, i.e. when the data are split into more than two classes. It should be noted, however, that it can be safely applied to binary classification problems as well.

Main Idea

The main idea of (Jirapech-Umpai & Aitken, 2005) and related methods is based on the genetic principles of evolution in the nature: selection, mutation, and survival of the fittest.

The evolutionary algorithm maintains a population of predictors (gene subsets) whose goodness for classification of microarray data is attested by a classifier. In the beginning, each predictor is created by means of the random selection of genes from the initial pool of genes. The initial pool of genes may include all genes in the study or just a subset of all genes, pre-selected by another technique, as was done in (Jirapech-Umpai & Aitken, 2005). As a classifier, any suitable algorithm can be utilized; in (Jirapech-Umpai & Aitken, 2005), k-nearest-neighbor (KNN) with the Euclidean distance function was chosen because of its simplicity and competitive performance, compared to more sophisticated algorithms (Dudoit & Fridlyand, 2003). The classifier returns the strength of each predictor, related to classification error attained when classifying the validation data with this predictor only.

At each iteration (generation) of the evolutionary algorithm, the replication of a particular predictor depends on how the mutation affects predictor’s strength (scoring function). Mutation includes three alternatives: 1) adding one gene, randomly chosen from the initial pool of genes, 2) deleting one gene from a predictor, and 3) doing nothing, i.e. keeping the predictor unchanged. The selection process uses a statistical replication algorithm. The predictors which have a higher score after mutation survive in the next generation, while the worst predictor in the current generation is replaced with the best predictor from the previous generation. Iterations terminate when either the maximum number of generations is reached or all predictors deliver similar classification performance over a specific number of generations. After termination, the best scoring (most accurate) predictor is the one to be used for classification of test data.

Complete Chapter List

Search this Book: