Research on Data Classification Method of Optimized Support Vector Machine Based on Gray Wolf Algorithm

The data classification method based on support vector machine (SVM) has been widely used in various studies as a non-linear, high precision, and good generalization ability machine learning method. Among them, the kernel function and its parameters have a great impact on the classification accuracy. In order to find the optimal parameters to improve the classification accuracy of SVM, this paper proposes a data multi-classification method based on gray wolf algorithm optimized SVM(GWO-SVM). In this paper, the iris data set is used to test the performance of GWO-SVM, and the classification result is compared with those based on genetic algorithm (GA), particle swarm optimization (PSO) and the original SVM model. The test results show that the GWO-SVM model has a higher recognition and classification accuracy than the other three models, and has the shortest running time, which has obvious advantages and can effectively improve the classification accuracy of SVM. This method has practical significance in image classification, text classification, and fault detection.


INTRODUCTION
Classification is a relatively old problem that has been widely studied in areas such as machine learning, pattern recognition, data mining, and artificial intelligence. Classification problems can be defined as follows: given a dataset, a given dataset is called a training dataset (Jiawei & Kamber 2001;Zhongzhi, 2002). The training dataset consists of a set of database tuples (often referred to as training samples, instances, or objects), each training sample is a feature vector consisting of attribute values or eigenvalues, and each training sample also has a class label attribute. A specific sample Qiuping et al. (2017) proposed an improved method of convergence factor based on the variation of cosine law. Compare with grid search algorithm, cross-validation algorithm, genetic algorithm and particle swarm optimization algorithm, The outstanding advantage of the GWO algorithm is that each iteration takes into account both the global search (global search methods can be an effective tool for investigating the predictor space and identifying subsets of predictors that are optimally related to the response) and the local search, which not only greatly increases the probability of finding the optimal solution, but also largely avoids the premature phenomenon.
In this paper, In order to solve the problems of low resolution caused by improper selection modal component C and penalty factor s and low accuracy of fault diagnosis caused by poor setting of SVM super parameters, the gray wolf algorithm is applied to the optimization of the kernel function and its parameters, A parameter optimization method of support vector machine on account of gray wolf algorithm is proposed, and the effectiveness of the method is verified by comparing the optimization results of genetic algorithm, particle swarm algorithm and original support vector machine. Compared with the same type of optimization algorithm, the iterative effect and optimization ability of GWO-SVM are greatly improved.

SVM ALGORITHM INTRODUCTION
SVM maps the historical series , , , x x x n 1 2  as input samples to the high-dimensional feature space H through the nonlinear mapping function j x ( ) , and performs linear regression in H , and the regression of SVM in the high-dimensional feature space The function is: In the expression, w represents the weight vector; b represents the bias vector. According to the principle of institutional risk minimization, it can be transformed into the following optimization problem: In the formula:   w is the term related to the complexity of the function f, e is the insensitive loss coefficient; x x i i ? * represent the relaxation factor; C represents the penalty factor. With the introduction of Lagrange multipliers, the optimization problem becomes a convex quadratic optimization problem: In the formula, a i and a i * represent the Lagrange multipliers; g represents the loss factor. In order to speed up the solution speed, the equation (4) is converted into a dual situation, there are: For the linear regression problem, the SVM regression function is: The establishment of a classification prediction model based on SVM is to find the optimal support vector parameters C and s . So that the expression (6) The research shows that when the prior knowledge of the process is lacking, the radial basis kernel function has fewer parameters and better performance than other kernel functions. Therefore, this paper chooses the radial basis kernel function as the kernel function of SVM, which is defined as follows: where s is the width parameter of the radial basis kernel function. It can be seen from the SVM modeling process that the SVM learning performance is closely related to the penalty coefficient C and the selection of the kernel function parameter s (Xiaodong, Guangliang, & Zixiang, 2012). Different kernel functions have little effect on SVM performance, but the parameters of the kernel function and error penalty factor C are the key factors affecting SVM performance. Therefore, choosing the appropriate kernel function parameters and error penalty factor C is critical to the performance of the learning machine. This paper optimizes the kernel parameters and error penalty factor C of the most widely used radial basis function SVM.
The performance of the support vector machine depends on multiple parameters, the kernel function parameters s mainly affect the complexity of the distribution of sample data in the high-dimensional feature space, and the change of the kernel parameters actually implies the change of the VC dimension of the feature space, which affects the confidence range and ultimately the structural risk range. The compromise between the maximization of the control interval of the penalty factor C and the classification error, the larger the C , the greater the penalty for the misdivided sample; A small value of C indicates a small penalty, a small complexity of the learning machine and a large empirical risk value, the former being called "over-learning", while the latter being "under-learning". In addition to optimizing C in the same feature space to obtain the optimal SVM corresponding to the corresponding space, the kernel function parameters are also optimized to obtain the global optimal SVM.

GWO OPTIMIZATION ALGORITHM
The GWO optimization algorithm is a new meta-heuristic optimization algorithm proposed by Mirjalili et al. in 2014Mirjalili et al. in (irjalili et al., 2014. It is a new type of swarm intelligence optimization algorithm. Relevant research shows that the algorithm has excellent performance in finding the optimal solution, and has the characteristics of simplicity and efficiency. Gray wolf belongs to the canine family, which is at the top of the natural food chain and is regarded as the top predator. Most wolves like to live in groups. The average number of gray wolves in each group is 5 to 12. In normal life, especially in the hunting process, they follow a very strict social hierarchy and task division system. In the GWO algorithm, the highest rank is the head wolf, which has two heads, one male wolf and one female wolf, which is marked as a , which is responsible for decision-making in the process of hunting (optimization) and leading the wolf pack. The remaining wolf packs are labeled β δ \ , and w by social rank. The behavior of the next level needs to obey the leadership of the upper level, and carry out the corresponding group hunting action, as shown in Figure 1.
The GWO algorithm imitates the hunting behavior process of wolves, which is mainly divided into three steps, namely encircling, hunting, and attacking. The modeling process of each step is as follows.

Surround
During the hunting process, the wolves first surround the target. The mathematical model of this process is: Where t is the current iteration, A and C are coefficient vectors, and X p is the global optimal solution vector (prey location), and X is the potential solution vector (wolf location). The values of A and C are calculated by the following formula: X t + ( ) 1 is the updated potential optimal solution vector. The location update process is shown in Figure 2.

Attack
The model construction of this process is mainly realized by the decrement of the value of a in formula (3). The value of a decreases linearly from 2 to 0, and the value of A will correspondingly obtain any value within the interval [-2a, 2a]. When A 1 , the wolf pack will be located between its current position and the prey position and can focus on attacking the prey X Y * * , ( ) ; When A > 1 , the position between the wolves and the prey deviates, and a global search is performed to find a more suitable prey.

GWO-SVM MODeL CONSTRUCTION
Since the penalty factor c and the kernel parameter s in the SVM have a great influence on the classification accuracy of the sample, this paper uses GWO to optimize the parameters in the SVM, and c, s ( ) constitutes the position vector of the gray wolf. The gray wolf group mainly searches according to the positions of α β δ , , , and separates from each other to find prey. In GWO, the value of parameter A is set to a random value greater than 1 or less than 1, which can make individual gray wolves deviate from the target prey. This behavioral mechanism allows GWO to search on a global scale and also enhances the exploration performance of the algorithm. Another component of the GWO exploration is c . c is a random value in the interval 0 2 ,       . This part provides random weights for the prey to randomly strengthen ( ) c > 1 or weaken ( ) c < 1 the weights to determine distance. The parameter c helps GWO to show more random behavior in the optimization process, which is beneficial to improve the global exploration ability of the algorithm, Especially in the later iteration, c can effectively make the algorithm jump out of the local optimum, and then find the global optimum solution, so as to obtain the best c and σ that can make the sample fault diagnosis with the highest accuracy, and optimize the fault diagnosis result. The implementation process of GWO-SVM is shown in Figure 3  Step 1: Input the sample data and set the training set and test set of SVM.
Step 2: Initialize the value range of c and s in SVM.
Step 3: Randomly generate gray wolf groups. The individual position vector of each gray wolf group is composed of penalty factor c and kernel function s composition and set relevant parameters of GWO.
Step 4: According to the initial parameters c and s , SVM trains the training set samples, and the individual fitness function is expressed by the classification error rate of the SVM algorithm to calculate the fitness of each gray wolf.
Step 6: Update the position of each individual in the wolf pack according to equations 5, 6, 7.
Step 7: Calculate the fitness value of the current gray wolf individual at the new position, and reselect α β δ , , from the current wolf group.
Step 8: If the number of iterations exceeds the maximum allowable number of iterations, the training ends, and the output global optimal position is the optimal value of c and s in the SVM; otherwise, skip to step 5 to continue parameter optimization.
Step 9: Use the optimal parameters c and s to establish a model, test the test set samples, and analyze and verify the test results.

Instance Data
For the sake of proving the advantages of the gray wolf algorithm optimization support vector machine in data classification, this paper selects the data set of iris (KDD997), which is a data set of multivariate analysis, and often used as an example in both statistical learning and machine learning. Iris Dataset is considered as the Hello World for data science. It contains five columns namely -Petal Length,

Data Preprocessing
The GWO-SVM algorithm is used to identify and classify iris varieties. In this paper, four characteristic variables are selected as input variables, namely: sepal length, sepal width, petal length, and petal width. In the process of recognition and classification, set the target output as 1, 2 and 3, where 1 represents variety 1, 2 represents variety 2 and 3 represents variety 3. Since the SVM model is very sensitive to the data in the range of [0, 1], it is necessary to normalize the feature vector to improve the training efficiency before inputting the training samples ( Figure 4 shows the original data of features, and Figure 5 shows the data after feature normalization), namely: Where: x is the original sample data. Input the normalized training samples into the vector machine model and use GWO to optimize the parameters c and s in the SVM, which improves the traditional way of setting the value randomly. The parameters c and s constitute the position vector of the individual gray wolf pack. The error rates representation of the individual fitness function SVM for the classification of iris samples. In this paper, the number of wolves is set to 10, the maximum number of iterations is set to 10, and the search interval of parameters c and s is [0.01, 100]. The optimal parameters c and s obtained by GWO optimization were used to construct the SVM model, and finally the constructed GWO-SVM model was tested on the test samples of iris.

example Results and Analysis
For the sake of proving the effectiveness and advantages of GWO-SVM, genetic algorithm (GA) and standard particle swarm optimization (PSO) are used to optimize the kernel function and parameters of the established SVM, and used the optimized SVM model to identify and classify the varieties of iris, and compare it with the test results of the support vector machine based on GWO optimization. Among them, GA selects crossover probability and mutation probability according to the fitness values of the object function, therefore reduces the convergence time and improves the precision of GA, ensuring the accuracy of parameter selection. And PSO uses its fast global optimization feature to search the parameters of SVM, which can reduce the blindness of trial and improve the accuracy of model prediction. The SVM classification and recognition results based on the three optimization algorithms are shown in Figures 6-9. It can be seen from Figure 6 to Figure 9 that GWO-SVM has the best recognition and classification effect, and only 2 test samples are incorrectly. PSO-SVM algorithm model has 4 test sample recognition errors; GA-SVM algorithm model has 8 test sample recognition errors; The SVM model without algorithm optimization has 10 test sample recognition errors and the recognition effect is the worst. Therefore, choosing the appropriate kernel function parameters and error penalty factor C is critical to the performance of the learning machine The optimization results of the penalty factor and kernel parameters in the SVM model by the three optimization algorithms are shown in Table 2. Table 3 shows the accuracy of GWO-SVM, PSO-SVM, GA-SVM and SVM algorithm models in identifying and classifying test sets and the running time of each program. As can be seen from Table 3, the accuracy rate of the original SVM model without optimization for iris variety identification and classification is 91.67%, and the running time is 1.78s, which is the lowest accuracy rate and longest running time among the four models. The accuracy rate of the original GA-SVM model for iris variety identification and classification is 93.34%, and the running time is 1.32s. This is because GA algorithm cannot make timely use of the feedback information of the network, the search speed of the algorithm is relatively slow, and more training time is required to obtain a more accurate solution. The accuracy rate of the original PSO-SVM model for iris variety identification and classification is 96.67%, and the running time is 0.96s. But PSO algorithm is easy to fall into local optimal solution, and it is difficult to obtain accurate optimal solution. The accuracy of GWO-SVM model for iris variety identification and classification was 98.33%, which was the highest among the four models. In addition, the running time of GWO-SVM model is 0.63s, which is also very advantageous in running time. Therefore, GWO-SVM model has obvious advantages and better application prospects.

CONCLUSION
In order to improve the classification accuracy of SVM, this paper optimized the penalty factor and kernel parameters of support vector machine on account of the gray Wolf algorithm, and obtained the GWO-SVM multi-classification model. The performance of GWO-SVM was tested by using iris data set, and compared with the classification results of GA-SVM, PSO-SVM and original SVM model. Finally, simulation experiments verify that the GWO-SVM classification model has better stability and superiority. The test results show that: The SVM model in terms of small sample data classification is of great advantage, but its accuracy and running time are under the influence of penalty factor and parameters of the SVM. After algorithm optimization, the recognition and classification accuracy of GWO-SVM model is improved to 98.3%, and the running time of program is the shortest. Compared with other models, GWO-SVM model can significantly improve the classification performance of support vector machine, which has obvious advantages and better application prospects.