Article Preview
Top1. Introduction
The emergence of the computer revolution, in terms of inexpensive memory and high processing speed, has created a renewed interest in nearest neighbor (NN) techniques. This has positioned the NN rule among popular nonparametric classification techniques. The K-Nearest Neighbor classification rule (KNN) is a powerful classification method allowing the classification of an unknown instance using a set of classified training instances.
Compared to other well known classifiers, neighborhood techniques remain very attractive thanks to their easy use. One of the limitations of the k-Nearest Neighbor rule (Suguna & Thanushkodi, 2010; Yu & Zhengguo, 2007) is the size of the training set. If the number of training instances is too small, the accuracy of the KNN classifier is not acceptable. But if the training set is too large, many KNN classifiers need excessive running time. This problem can be solved in 3 ways (Shi et al., 2007): by reducing the dimensions of the feature space; by using smaller data sets; or by using an improved algorithm which can accelerate the calculation time.
The calculation of a consistent training subset with a minimal cardinality for the KNN rule turns to be an NP-hard problem (Cover & Hart, 1967; Wilfong, 1992). Researchers have proposed instances selection techniques to address this problem. Like many other combinatorial problems, the instance selection (IS) requires an exhaustive search to obtain an optimal solution. Among the existing methods, metaheuristics and especially genetic algorithms and tabu search have proven their efficiency in dealing with the said problem (Gil-Pita & Yao, 2007; Aci et al., 2010; Suguna & Thanushkodi, 2010; Ceveron & Ferri, 2010; Wu et al., 2011).
Recently, many Ant based algorithms have been successful in solving different types of combinatorial optimization problems, such as: symmetric and asymmetric traveling salesman problems (Gambardella & Dorigo, 1995; Gambardella & Dorigo, 1996; Dorigo & Gambardella, 1997); the sequential ordering problem (Gambardella & Dorigo, 1997; Gambardella & Dorigo, 1999); the quadratic assignment problem (Gambardella & Dorigo, 1995); and the vehicles routing problem (Gambardella & Taillard, 1999).
In this paper we propose a new approach, to reduce the number of instances in the training set of the 1NN classifier involving the concepts of ant colony optimization.
We note that the proposal described in this paper is an extended algorithm to the one described in our previous work (Miloud-Aouidate & Baba-Ali, 2012b). Our previous version presented some weaknesses related principally to the stopping criterion. In this paper we present an improvement for the previous algorithm introducing a new stopping criterion and a new selection equation which allows summarizing the selection calculation, prescript in two stages before. In addition, this paper investigates the reduction aspect of the proposed algorithm, which constitutes one of two bases of the instance selection.