Evolutionary algorithms (EA) (Rechenberg, 1965) belong to a family of stochastic search algorithms inspired by natural evolution. In the last years, EA were used successfully to produce efficient solutions for a great number of hard optimization problems (Beasley, 1997). These algorithms operate on a population of potential solutions and apply a survival principle according to a fitness measure associated to each solution to produce better approximations of the optimal solution. At each iteration, a new set of solutions is created by selecting individuals according to their level of fitness and by applying to them several operators. These operators model natural processes, such as selection, recombination, mutation, migration, locality and neighborhood. Although the basic idea of EA is straightforward, solutions coding, size of population, fitness function and operators must be defined in compliance with the kind of problem to optimize. Multi-class problems with binary SVM (Support Vector Machine) classifiers are commonly treated as a decomposition in several binary sub-problems. An open question is how to properly choose all models for these sub-problems in order to have the lowest error rate for a specific SVM multi-class scheme. In this paper, we propose a new approach to optimize the generalization capacity of such SVM multi-class schemes. This approach consists in a global selection of models for sub-problems altogether and is denoted as multi-model selection. A multi-model selection can outperform the classical individual model selection used until now in the literature, but this type of selection defines a hard optimisation problem, because it corresponds to a search a efficient solution into a huge space. Therefore, we propose an adapted EA to achieve that multi-model selection by defining specific fitness function and recombination operator.
The multi-class classification problem refers to assigning a class to a feature vector in a set of possible ones. Among all the possible inducers, Support Vector Machine (SVM) have particular high generalization abilities (Vapnik, 1998) and have become very popular in the last few years. However, SVM are binary classifiers and several combination schemes were developed to extend SVM for problems with more two classes (Rifkin & Klautau, 2005). These schemes are based on different principles: probabilities (Price, Knerr, Personnaz & Dreyfus, 1994), error correcting codes (Dietterich, & Bakiri, 1995), correcting classifiers (Moreira, & Mayoraz, 1998) and evidence theory (Quost, Denoeux & Masson, 2006). All these combination schemes involve the following three steps: 1) decomposition of a multi-class problem into several binary sub-problems, 2) SVM training on all sub-problems to produce the corresponding binary decision functions and 3) decoding strategy to take a final decision from all binary decisions. Difficulties rely on the choice of the combination scheme (Duan & Keerthi, 2005) and how to optimize it (Lebrun, Charrier, Lezoray & Cardot, 2005).
Key Terms in this Chapter
Support Vector Machine (SVM): SVM maps input data in a higher dimensional feature space by using a non linear function and finds in that feature space the optimal separating hyperplane maximizing the margin (that is the distance between the hyperplane and the closest points of the training set) and minimizing the number of misclassified patterns.
Model Selection: Model Selection for Support Vector Machines concerns the tuning of SVM hyper-parameters as C trade-off constant and the kernel parameters.
Evolutionary Algorithm (EA): Meta-heuristic optimization approach inspired by natural evolution, which begins with potential solution models, then iteratively applies algorithms to find the fittest models from the set to serve as inputs to the next iteration, ultimately leading to a sub-optimal solution which is close to the optimal one.
Search Space: Set of all possible situations of the problem that we want to solve could ever be in.
Multi-Class Combination Scheme: A combination of several binary classifiers to solve a given multiclas problem.
Cross-Validation: A method of estimating predictive error of inducers. Cross-validation procedure splits that dataset into k equal-sized pieces called folds. k predictive function are built, each tested on a distinct fold after being trained on the remaining folds.
Trade-Off Constant of SVM: The trade-off constant, noted C, permit to fix the importance to increase the margin for the selection of optimal hyper-plan in comparison with reducing predictive errors (i.e. examples which not respect margin distance from hyper-plan separator).