Improved Equilibrium Optimizer for Short-Term Traffic Flow Prediction

Meta-heuristic algorithms have been widely used in deep learning. A hybrid algorithm EO-GWO is proposed to train the parameters of long short-term memory (LSTM), which greatly balances the abilities of exploration and exploitation. It utilizes the grey wolf optimizer (GWO) to further search the optimal solutions acquired by equilibrium optimizer (EO) and does not add extra evaluation of objective function. The short-term prediction of traffic flow has the characteristics of high non-linearity and uncertainty and has a strong correlation with time. This paper adopts the structure of LSTM and EO-GWO to implement the prediction, and the hyper parameters of the LSTM are optimized by EO-GWO to transcend the problems of backpropagation. Experiments show that the algorithm has achieved wonderful results in the accuracy and computation time of the three prediction models in the highway intersection.


INTRODUCTION
With the development of social economy, the number of vehicles is increasing rapidly and the traffic congestion has seriously affected road safety and environmental pollution. The transportation departments formulate management strategy and improve service level through utilizing the existing highway facilities and transportation network resources (Chou et al., 2018). In recent years, intelligent transportation system (ITS) has become a hot research area (Wang, Chen, Cheng, Lin, and Lo, 2015;Zhuang, Luo, Pan, and Pan, 2020;Song, Pan, and Chu, 2020). In this paper, long short-term memory (LSTM) and meta-heuristics are studied here to predict the short-term traffic flow.
training parameters. The historical information which it actually uses is very limited. Hochreiter and Schimidhuber (1997) proposed long short-term memory (LSTM) in 1997 to improve the traditional RNN model. The LSTM units are composed of memory cell, input gate and output gate. Cell is the core computing node and is used to record the current time state. Figure 1 is the unit of LSTM and its equations are as follows: (1) (2) where i, f, c and o are input gate, forget gate, cell state and output gate respectively. w and b are weight coefficient and bias. δ is the sigmoid activation function. LSTM model includes the following steps: (1) The output value of cell is calculated according to the Eqs. (1) -(5).
(2) The error term of cell is computed through the reverse direction.
(3) The weights and bias are updated with the gradient descent.

Equilibrium Optimizer (EO)
In the EO, each particle represents a candidate solution, and the concentration is the position of a particle. The particles adopt Eq. (6) to update their concentrations by the global optimal solutions (equilibrium candidates) and they finally achieve the equilibrium (optimal solution).
where X n i ( ) is the position of the i th particle at the n th iteration. Since V is a unit volume, it has also been expressed as follows: X n X n X n X n F n G n i eq i eq X eq is the equilibrium concentration and it is chosen from the equilibrium pool.
X n X n X n X n X eq eq eq eq eq X eq1 , X eq 2 , X eq 3 and X eq 4 are the positions of the first four optimal solutions in the control volume, respectively. X eqavg is the average position of the four optimal solutions and X eq randomly picks one from X eq1 , X eq 2 , X eq 3 , X eq 4 and X eqavg .  F is an exponential term and controls the balance between exploration and exploitation of the algorithm. In the early phase, EO has the ability to search in more space and it gradually has a powerful exploitation with the evolution of the algorithm.
where Max_iter means the maximum iteration.
 r and  l are two random vectors between [0,1]. a 1 controls the algorithmic exploration and a 2 determines its exploitation. Experiments show that a 1 = 2 and a 2 = 1, EO has the vigorous abilities of exploration and exploitation. sign is a signum function and controls the direction of exploration, where the value is computed by Eq. (11).
 G makes the algorithm powerfully find the optimal solution at the exploration stage and it is acquired by the following equations.
where r 1 and r 2 are two random numbers between [0,1]. GP dominates the participation probability of position updating through the generation rate. When GP = 1 implies that there are no particles joining in the optimization procedure and EO has high exploration ability. While GP = 0 represents that X eq participates in the optimization process. Empirical testing illustrates that EO has a well balance between exploitation and exploration when GP = 0.5. If ) have the same signs,  G assists the algorithm searching in more place. When they have the opposite signs,  G is helpful for local search.

HyBRID OPTIMIZATION ALGORITHM EO-GwO
The four optimal solutions control the search direction of EO, and they affect the solution quality of the algorithm. To improve the convergence rate and avoid falling into local traps, this section proposes a hybrid algorithm EO-GWO, which includes a strategy to jump out of the local optimum.
The hybrid algorithms usually enhance the optimal solution of the population to improve the search speed, such as the gbest of PSO (Guo, Chen, Yu, Su, and Liu, 2015) and α, β and γ of grey wolf optimizer (GWO) (Mirjalili, Mirjalili, and Lewis, 2014). They will be judged by the objective function when deeply optimizing them. If the objective function has less calculation amount, the time of the hybrid algorithms is not affected. However, the performance of the algorithms is seriously influenced when a lot of time is needed.
It is known from Eqs. (7) and (8) that the population is guided by the four optimal solutions of EO. If they are close to the optimal solutions, other particles quickly gather in this area and the convergence speed is improved. EO-GWO adopts the GWO algorithm to optimize them, so that they have better solutions and promote the exploitation ability of the algorithm. But the four optimal solutions are enhanced without calling the objective function to improve the execution efficiency. Suppose α β and γ are always X eq1 , X eq2 and X eq3 . Their positions are updated by the following equations.
X n X X X i + ( ) = where r 1 and r 2 are two random values between [0, 1]. With the iteration process, a decreases linearly from 2 to 0. After the method is executed, the optimized X eq1 , X eq2 , X eq3 and X eq4 estimate the objective function. EO-GWO judges whether they are superior to the original values, and it selects the four best solutions as the new positions of X eq1 , X eq2 , X eq3 and X eq4 . The other particles are still updated by Eq. (7).
In EO, X eq1 , X eq2 , X eq3 and X eq4 have the same probability to direct the algorithm. Although the strategy makes the algorithm search more exploration space, it reduces the convergence rate. In order to improve the solution speed, the proportion that they are selected is 4: 2: 2: 1. The particles have a high probability of moving toward the X eq1 and they also explore unknown space.
The above-mentioned methods accelerate the convergence. Once a objective function has multiple local optima or one solution is too excellent. EO-GWO is easy to fall into local traps and it is impossible to make further progress in searching for solutions. An escaping approach is utilized on the EO-GWO to improve the exploration ability.
In the meta-heuristic, exploration is the abilities of investigating different unidentified areas in the solution space and classifying the global optimum. If the fitness value of a particle is not updated for a long time, it means that the current search space of a particle is not in the region where the optimal solution is located. So, it is necessary to force it to search other places and the particle needs to learn from others. The equation is as follows: where rj is a random integer of [1, popSize] and rj ≠ i. popSize is the number of population. Each dimension of a particle randomly learns from the surrounding particles. If the area of the learned object has a better solution, the particle will move to it. If it is not, the particle in other dimensions influences its movement. This strategy improves the exploration and balances the exploitation. The pseudo code of the algorithm is shown in the Figure 2.

EXPERIMENTAL RESULTS AND DISCUSSION
29 benchmark functions are used to validate the abilities of exploration and exploitation of EO-GWO. Table 1 shows the details of the functions (Hu, Pan, and Chu, 2020). Limit means the boundary of search space; Dim is the dimension and fmin indicates the theoretical optimum. Type represents the feature of benchmark function. They are composed of unimodal (f 1 -f 7 ), multimodal (f 8 -f 14 ), fixeddimension (f 15 -f 23 ) and composite (f 24 -f 29 ) functions.
EO-GWO is compared with EO and two hybrid algorithms, WOA-SA (Mafarja and Mirjalili, 2017) and PSO-GWO (Mishra et al., 2020). Table 2 is the details of the algorithms. The number of population is 30 and the max iteration is 500. They run 30 times. Table 3 shows the average values of optimal solutions. Wilcoxon's rank-sum test is implemented at a 5% significance level to judge whether the experimental results are statistically significant. Table 4 is the results of Wilcoxon's rank-sum test based on EO-GWO. "-" means that the algorithm is inferior to EO-GWO. "+" indicates that the compared algorithm is superior to EO-GWO. "=" represents that the algorithm and EO-GWO have the same performance.

EXPERIMENTAL RESULTS AND ANALySIS
As can be seen from Tables 3 and 4, EO-GWO has excellent performance. EO does well in 5 functions, while EO-GWO wins EO in 9 functions. Especially in the unimodal functions, the performance of EO-GWO has been greatly improved, which proves that the further optimization of the four optimal solutions and the position updating method accelerate the convergence of the algorithm and improve its exploitation. WOA-SA is better than EO-GWO in 7 functions and it has 16 functions worse than EO-GWO. The PSO-GWO algorithm does not optimize the optimal solution and it is found that EO-GWO is superior to PSO-GWO in both multimodal and unimodal functions. EO-GWO does not achieve as well as PSO-GWO in 4 functions and implements better than PSO-GWO in 16 functions.
In the unimodal functions, WOA-SA excels in f 1 and f 2 , but it executes poorly in f 3 and f 4 . EO-GWO has great achievements in f 3 , f 4 and f 5 . Especially in f 5 , other algorithms are far away from the optimal value, while EO-GWO acquires the ideal result. WOA-SA and EO-GWO have the same performance in f 6 , and PSO-GWO gets the worst data. EO and EO-GWO obtain the best result in f 7 , but WOA-SA does not perform well. This illustrates that EO-GWO has a powerful exploitation ability and searches for the optimal solution in the known space. WOA-SA further enhances the optimal solution through simulated annealing (SA), but it does not complete successfully in the complex functions. The escaping strategy of EO-GWO avoids local traps and seeks for solutions in more unknown spaces. Although EO-GWO does not call the objective function for the optimized solutions to adjust the search direction, the algorithm still has excellent performance in the unimodal functions (f 3 -f 7 ).
Multimodal function has many local optimal solutions, so it is suitable to judge whether the algorithm jumps local minima. EO-GWO implements perfectly in f 9 , f 10 , f 11 and f 12 . EO achieves great results in f 9 , f 11 and f 13 , and WOA-SA has the best performance in f 8 and f 11 . PSO-GWO does well in f 13 . The behavior of EO-GWO is significantly better than other algorithms, which demonstrates that it is able to escape from the local traps and find the optimal solution in more space.
Fixed-dimension function merely has a few local optimum and its dimension is also small. The algorithms acquire the same data in f 16   From above discussion, the optimization ability is advanced by modifying the four optimal solutions and the new position updating strategy manages to avoid local traps. EO-GWO improves the solution quality.   Table 5 shows the running time (seconds) of the algorithms. PSO-GWO has the shortest time, while EO-GWO and WOA-SA have large time complexity. This is because they increase the optimization of the optimal solutions in each iteration. EO-GWO does not add the evaluation of objective function, so the time is less than WOA-SA in most test functions. If the objective function is too simple, such as, f 15 -f 23 , the execution of objective function spends a small of time. The structure of EO-GWO is more complex than WOA-SA and the advantages of EO-GWO are not obvious. Unimodal function only has a global optimal solution and no local trap. Consequently, it is useful to judge the convergence speed of the algorithm. From Figure 3 known, EO-GWO beats other algorithms. PSO-GWO has the slowest speed in f 1 and f 2 . EO-GWO obtains the best performance before 400 iterations and WOA-SA acquires good convergence in the later iterations. In f 3 and f 4 , WOA-SA has the worst speed and PSO-GWO does not perform well in f 5 , f 6 and f 7 . EO-GWO has the outstanding achievements in f 3 , f 4 and f 5 . The rate of EO-GWO is faster than EO and WOA-SA in f 6 , and EO and EO-GWO almost have the same curve in f 7 . Although WOA-SA improves the optimal solution, it can not search more space when the objective function is complicated. EO-GWO not only accelerates the convergence, but also searches for solutions in more space.

APPLICATION FOR SHORT-TERM TRAFFIC FLOw PREDICTION
The neural network optimizes parameters through BP, hoewever this method has inherent defects such as slow learning speed, low accuracy, and easy to fall into local minima. Meta-heuristic algorithm is a global optimization process. Since it finds the global optimal solution in a multi-dimensional search space, it is widely used to train the parameters of neural networks. The parameters of neural network are optimized through meta-heuristic algorithm, and then the obtained parameters are further optimized accurately by the neural network.

Table 4. Wilcoxon's rank-sum test of the EO, WOA-SA and PSO-GWO on EO-GWO
The characteristics of traffic flow are mainly described by traffic flow, vehicle speed and density, among which traffic flow is particularly important. It intuitively reflects the operation of traffic; therefore traffic flow is generally selected as the prediction parameter. This section adopts the LSTM and meta-heuristics to predict short-term traffic flow.

The Prediction Model of Traffic Flow Based on LSTM
The prediction is based on the historical data of traffic flow and uses a certain method to build a reasonably mathematical model to predict the future traffic flow. When forecasting the short-term traffic flow at a highway intersection, the flow on the traffic section has an inevitable relationship with the previous several periods of the upstream. Consequently, they have the relationships of sequence and space. In this study, the data of the upstream is utilized to predict the traffic flow of the specific road and the following three models are applied to achieve short-term predictions, as shown in the Figure 4.where f(x-1, t-1) represents the traffic flow of the last intersection of x at time t-1. The dimension of meta-heuristic is the number of parameters of neural network to be trained. The numbers of cell and output are 10 and 1. The input numbers are 1, 2 and 3 by Models 1, 2 and 3 respectively, so the dimensions are 360, 400 and 440. LSTM uses the sum of squared errors as the fitness function by default. The following equation is adopted to improve the prediction accuracy and the structure of the prediction model is shown in the Figure 5.
where n is the number of training data, y i and y i are the real and prediction values of i th .

SIMULATION RESULTS
The data comes from part of Wilson way OC, the PEMS data set of California highway network, and it contains the data between March 14, 2011 andMay 29, 2011. The traffic volume is summarized at intervals of 5 minutes, so a traffic intersection has 288 data per day. The data from the first 53 days is used for training and the remaining 30% for testing. The compared algorithms contain EO, EO-GWO, WOA-SA, PSO-GWO, GA (Raza and Zhong, 2018) and PSO (Chan, Dillonm, and Chang, 2012). Tables 6 and 7 depicts the statistical results, including the values of prediction error (Error) and running time (Time). Tables 8 and 9 are the Wilcoxon's rank-sum test and Friedman test.   Table 6 shows the prediction errors of the three models, and WOA-SA are all smaller than those of EO, EO-GWO, PSO-GWO, GA and PSO. Compared to other algorithms, EO-GWO can improve the prediction accuracy by more than 13.26%, which is a good illustration of the ability of the proposed algorithm to capture the trend of data changes and thus make accurate predictions.
The selection of hyperparameters for LSTM prediction models has an important impact on prediction accuracy. Traditional hyperparameter selection methods are random; therefore, it can be reduced by forming hyperparameters into a multi-dimensional solution space and obtaining optimal parameter combination by traversing the solution space. Optimizing the LSTM using meta-heuristics yields smaller evaluation errors, and WOA-SA has the best performance, which presents that WOA-SA improves the overall performance of the LSTM. The proposed model has the prediction error with Model 3 of 38.75%, indicating that better parameters are obtained during the iterative process using the proposed algorithm. EO-GWO can effectively find the optimal parameter combination of LSTM network traffic prediction and reduce the prediction error.
Tables 6 and 7 reveals that WOA-SA has the best prediction errors and GA has the highest running time. EO wins the performance in the running time and PSO-GWO is superior to EO-GWO. Through the non-parametric statistical analysis, it is found that the prediction error of WOA-SA is better than EO-GWO in the Model 1 and they obtain the same statistical results in the Models 2 and 3. WOA-SA and EO-GWO beat others in the Model 2. EO is inferior to EO-GWO in the Model 2 and they acquire the same prediction in the Model 1. PSO-GWO, PSO and GA are not as good as EO-GWO in three models. Friedman test demonstrates that the rank value of EO-GWO is 2 and the P-value is 2.4788E-03. It is less than 0.05, so the hypothesis holds. In terms of running time, EO has the smallest time complexity in the three models. GA and PSO have the same statistical data as EO-GWO in the Model 3 and they are inferior to EO-GWO in the Models 1 and 2. EO-GWO is superior to WOA-SA and PSO-GWO in the Models 1, 2 and 3.
At WOA-SA is suitable for identifying the optimal solution in the simple prediction model. EO-GWO illustrates its advantages when the model is complexity. It searches in more space and reduces the evaluation of the objective function. EO-GWO has made a well balance between prediction accuracy and running time. The algorithms have small prediction errors in the Model 3 and large errors in the Model 1, which presents that the prediction accuracy of the model is related to the used historical data.
The algorithms obtain poor prediction results on the 4567-th, 1205-th, 1198-th, 1037-th, 900-th, 1336-th, 633-th, 578-th, 1150-th and 586-th test data of the three models. This is because the data suddenly rises or falls, and the model cannot make a good judgment. Taking the 1198-th as an example, the data in the Model 3 is as follows: 20, 24, 23 and the label data is 9, which has a wide deviation from the historical data. The predicted values of the model are between [17,18] and the error rates exceed 80%.

CONCLUSION
In order to advance the training accuracy and reduce running time of LSTM, this paper proposes the improved EO-GWO algorithm. Experiments show that EO-GWO has forceful solution ability and high efficiency in the benchmark functions. Traffic flow prediction is based on the historical traffic flow data. It utilizes a certain method to construct a reasonably mathematical model and then the model is adopted to predict the future traffic flow. LSTM is used to predict short-term traffic flow and EO-GWO algorithm is applied to train its parameters. EO-GWO achieves great results in the three prediction models of traffic flow. In future work, weather, date, emergencies and other factors can be added to the prediction model of traffic flow to achieve better results. Equilibrium optimizer and its variant are also applied to more deep learning structures.

DECLARATION OF COMPETING INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.