Bicycle Balance Control Based on Tunicate Swarm Algorithm Optimized BP Neural Network PID

In this study, the authors introduce a novel approach that leverages the tunicate swarm algorithm (TSA) to optimize proportional-integral-derivative (PID) controller based on a back propagation (BP) neural network. The core objective of the approach is to manage and counteract uncertainties and disturbance that may jeopardize the balance and stability of self-driving bicycles in operation. By using the self-learning capabilities of BP neural networks, the controller can dynamically adjust PID parameters in real time. This enables an enhanced robustness and reliability during operation. Further bolstering the efficiency of our controller, the authors use the TSA to optimize the initial weights of a neural network. This effectively mitigates the commonly associated with slow convergence and being entrapped in local minima. Through simulation and experimentation, the findings reveal that the TSA-optimized BP neural network PID controller dramatically improves dynamic performance and robustness. It also proficiently manages changes in the environment such as wind and ground bumps. Therefore, the proposed controller design offers an effective solution to the balancing problem of self-driving bicycles and paves the way for a promising future in designing versatile controllers with broad application potential.


INTRODUCTION
As a type of unmanned two-wheeled vehicles, unmanned bicycles have unique advantages, such as their agility and environmental friendliness.Therefore, they have great application prospects in various fields, such as avoiding traffic congestion, unmanned express delivery, and security patrols.These prospects have attracted an increasing attention from experts and scholars for research on the motion balance control of unmanned bicycles (Su & Chen, 2020;Owczarkowski et al., 2019).Additionally, numerous solutions have been proposed to address the balance problem of unmanned bicycles, which was broadly classified into two categories: with and without stabilizers.
When no stabilizer is added to an unmanned bicycle, the most typical method used (Cui et al., 2020;Huang et al., 2017;Yongli et al., 2020;Huang et al., 2017) is to control the vehicle's turning angle.This is done by combining the rear wheel speed of the bicycle to generate centrifugal force to maintain its balance.Although this method does not require stabilizer addition, it can only maintain balance during motion and not during stillness.
The stabilizer problem of self-driving bicycles has various solutions.Some stabilizers use the principle of a gyroscope, which utilizes the energy produced by a rapidly rotating flywheel and then controls the angle of the flywheel to produce a force that helps maintain balance (Zheng et al., 2022;Park & Yi, 2020;Różewicz & Piłat, 2020).This method can maintain a bicycle's balance when stationary and moving, and the flywheel response speed is fast, producing a large balance force.As suggested by CB et al. (2021), adding weight and adjusting the position on the bicycle can achieve balance by altering the bicycle's overall weight distribution.Although this method can also maintain a vehicle's balance when stationary and moving, adding weight increases its overall weight and size.The stabilizer used by Kim et al. (2015), Kien et al. (2023), Chiu and Wu (2020), Kim et al. (2013), andKien et al. (2021) utilizes an inverted pendulum-based flywheel to generate a force that balances gravitational forces that act on the vehicle.This is achieved by modulating the flywheel's rotation, providing fast response times and instantaneous force.Furthermore, the flywheel's slower speed at the equilibrium state results in reduced power consumption.
Given the advantages and disadvantages of the various solutions mentioned, this study aims to establish a bicycle model that uses an inertial wheel utilizing the concept of an inverted pendulum.
Recent research on control algorithms for unmanned bicycles has been conducted.Zhang et al. (2023) proposed the Udwadia-Kalaba control method to balance unmanned bicycles by compensating for any deviation of the initial roll angle when the bicycle is stationary.This approach offers various benefits, such as rapid system response, low overshoot, and ease of control torque optimization.However, it only examined the stability of the bicycle when it was static and did not consider the balance problem when it moves forward.Li (2021) used cascaded PID control to integrate unmanned bicycle balance control and line-following control, resulting in enhanced control accuracy and faster system response.However, limited by the fixed PID parameters, PID control does not have a strong anti-interference ability when encountering disturbances during motion.Zhang et al. (2021) used the Kane method to establish an accurate mathematical model of unmanned bicycles and then proposed full-state feedback control that combines velocity correction and optimal control.They also used Linear Quadratic Regulator (LQR) to control a bicycle's balance.However, this method highly depends on an accurate mathematical model.During the bicycle's forward movement, it may encounter environmental factors (e.g., uneven road surfaces and wind instability interference).Moreover, the system is a typical nonlinear and underactuated control system (Zhang et al., 2020).
To address the shortcomings of prior studies and these challenges, we propose a novel method for balancing unmanned bicycles using a fine-tuned BP neural network PID controller using the tunicate swarm algorithm (TSA).The BP neural network PID does not depend on a spatial state model and can approximate complex nonlinear systems.The BP neural network PID controller can dynamically adjust and fine-tune itself to the system as the bicycle moves forward, enabling it to be highly robust and faulttolerant.This allows the PID controller to handle external disturbances affecting the bicycle's balance effectively.However, the initial weight values of the BP neural network determine its convergence speed and the problem of getting stuck in local minima.Therefore, selecting appropriate initial weight values significantly impacts the performance of the entire PID controller.Typically, the network's initial weight values are randomly generated, which is highly uncertain and results in significant vibration and overshoot phenomena when the motor starts.Thus, the requirements of unmanned bicycle balance are difficult to meet.This study's innovation is the introduction of the TSA, which has a strong ability to jump out of local optima and a high optimization capability (Sharma. et al., 2021), to globally optimize the initial weight values of a BP neural network.Then, fitness values are obtained by the output of the network's forward and feedback Kp, Ki, and Kd values, and the initial weight values are finally updated.TSA compensates for the disadvantages of the BP neural network, easily getting stuck in local minima and having a slow convergence speed (Wan & He, 2023), and enhances the dynamic performance of the BP_PID algorithm.This study uses simulation comparison to compare the performance and robustness of the BP neural network PID controller optimized using the TSA with three other controllers, highlighting the advantages of the proposed algorithm.
The remainder of this paper is arranged as follows.Section 2 provides a dynamic model for unmanned bicycles.The unmanned bicycle controller and optimization algorithms are presented in Section 3. Section 4 presents the conducted simulation comparisons and analyzes of different controller algorithms.Section 5 validates the optimized controller algorithm on the experimental platform of unmanned bicycles.Section 6 discusses the simulation and experimental results of this study and the limitations of the proposed method.Finally, Section 7 concludes the paper and provides suggestions for future research.

eSTABLISHMeNT OF A DyNAMIC MODeL FOR UNMANNeD BICyCLeS
The Lagrange equation (Yetkin et al., 2014) is an energy-based approach that characterizes the motion equation of a system by considering the differences between a system's potential and kinetic energies.This method can transform a physical model into mathematical formulas, which can be used for the simulation and control design.Specifically, unmanned bicycles are complex systems with nonlinear and strong coupling characteristics.In this study, a momentum wheel, which can switch rapidly between forward and reverse directions, generates torque to counteract the gravitational torque applied to the bicycle in a semi-stable state, thus achieving balance.As the TSA only needs to optimize the initial weights of a BP neural network and during the initial startup phase of the bicycle, the bicycle is in a stationary state.The static transfer function derived from the static bicycle model greatly simplifies the unmanned bicycle model.The Lagrange equation is employed in this study to establish the dynamic model of an inertial wheel-based unmanned bicycle in a static state.
The Lagrange equation is derived from the universal equation of dynamics and represents the motion differential equations of a system of particles.The equation is expressed as: Here, q i , F i , L , T , and V represent the generalized coordinates of the system, the generalized forces of the system, Lagrangian function, total kinetic energy, and total potential energy, respectively.In Figure 2, θ represents the distance between the bicycle deviation and the vertical line of the ground, and ω WB denotes the rotation angle of the inertial wheel.A coordinate system is established using θ and ω as generalized coordinates.The model is also established as follows: Here, T 1 represents the total momentum of the bicycle frame; J 1 , moment of inertia of the bicycle frame about point O; T 2 , momentum of the flywheel; m 2 , mass of the inertial wheel; J 2 , moment of inertia of the inertial wheel about its center; l 2 , the vertical distance from the center of mass of the inertial wheel GB to the ground; and l 1 , vertical distance from the center of gravity of the entire bicycle GA to the ground (Figure 1).
V m l m l gcos = + ( ) Here, V represents the total potential energy of the unmanned bicycle and m 1 is the mass of the bicycle frame.
Substituting the above equations into the Lagrange equation, Here, Ä represents the force required to balance the unmanned bicycle.
The equation indicates the nonlinearity of the system.To simplify the controller design, the system is linearized at the operating point (θ=0).Thus, sinθ is approximated by θ.Consistent with the conventions of control theory, u represents the input torque -τ of the controlled object.The following differential equation can describe the resulting mathematical model of the system after linearization: (9) Taking θ and u as the output and input in the equation, respectively, and applying Laplace transform, we obtain: Rotational inertia of the bicycle body

TSA
The TSA is a novel optimization algorithm that draws inspiration from the propulsion and collective behavior of tunicates, sessile animals in the deep sea that navigate and forage through jet propulsion and group behavior (Kaur et al., 2020).This jet propulsion behavior primarily relies on an individual's gravity, the steady flow of interrupted water currents in the deep sea, and interaction forces between individuals to accomplish three tasks: avoiding conflicts between search agents, moving toward the position of the best search agent, and maintaining proximity to the best search agent.Conversely, the collective behavior is achieved through the tunicate's ability to perceive changes in water flow and the bioluminescence of their surrounding companions, allowing them to determine their positions and collectively converge toward the target food source.It can be generalized to update the position of all search agents regarding the current best solution.To ensure that search agents do not overlap or collide, a new position for each search agent is calculated using  A .Here,  G ,  F , and  M represent the force of gravity, interrupted water currents in a steady flow, and interaction forces between individuals, respectively.The values of c1, c2, and c3 are randomly generated within the range of [0,1].P min and P max are typically set to 1 and 4 (Zhang et al., 2023), respectively.They represent the initial velocity of the interaction between individuals and the subordinate velocity, respectively.The following formula is used to calculate the new position: To move toward the position of the best search agent (represented by FS ), the following formula is used by the search agents.Here, PD , P x p ( ) , and r and represent the distance between the food source and search agent, position of the search agent at the current iteration, and a random number between 0 and 1, respectively.

PD FS r P x and p
= − ⋅ ( ) The search agents' orientation toward the direction of the optimal search agent is maintained.
The formula for updating the position of the search individuals relative to the FS direction is expressed as P x P ′ ( ) and is expressed as follows: To simulate the collective behavior of swarming animals, the positions of other search agents are updated based on the positions of the top 2 optimal search agents.The following is the formula for updating the position of search agent  P x p + ( ) Figure 3 presents the flowchart of the TSA.

BP Neural Network PID
The incremental digital PID control algorithm can be expressed as follows: Here, u k ( ) , u k − ( ) , and e k ( ) represent the output of the PID controller for the kth time, output of the (k-1)th time, difference between the two outputs, and control error, respectively.Meanwhile, parameters K P , K I ,and K D in the algorithm are dynamically adjusted in real time through a neural network.The square of the output error e k ( ) is used as the evaluation reference index E k The input of the network is = ( ) = ( ) One common activation function used in the hidden layer is the symmetrical Sigmoid function: The output layer activation function is chosen as the positive Sigmoid function because K P , K I , and K D must be positive.
The output of the hidden layer is The output of the output layer is By using a gradient descent method, the formula for updating the weight values is

TSA-Optimized BP_PID
The choice of initial weights in a BP neural network plays a crucial role in determining its convergence speed and tendency to get stuck in local minima (Zhang et al., 2022).Selecting appropriate initial weights is essential to ensure the optimal performance of the entire PID controller.Typically, a network's initial weights are randomly generated, leading to high uncertainty.To address this problem, the TSA performs global optimization on the initial weights of the BP neural network.The fitness value is obtained by feeding the output of K K P I , , and K D values through the network's forward feedback.The initial weights ω are updated accordingly.
The unmanned bicycle controller employs cascaded PID for balance control.The angle loop utilizes TSA-optimized BP neural network PID, whereas the speed loop uses conventional PID to regulate the inertial wheel's speed.The angle of the unmanned bicycle's balance position serves as the target value, which is input into the angle loop.Consequently, the angle loop's output serves as the target value for the speed loop.Finally, the output of the speed loop is passed on to the inertial wheel, which counters the torque generated by bicycle tilting, thereby maintaining the bicycle's balance and stability.Figure 4 presents the block diagram illustrating this process.
As presented in Figure 5 and Table 2, when subjected to a step input signal and no external disturbances, TSA_BP_PID and PSO_BP_PID controllers exhibit shorter rise times than BP_PID and PID controllers.Furthermore, the TSA_BP_PID controller has the shortest adjustment time, whereas the BP_PID controller has the smallest overshoot.As a result, the overshot of TSA_BP_PID is larger than that of BP_PID.However, considering the comprehensive factors of adjustment time, rise time, and overshoot, the dynamic performance of the TSA_BP_PID controller is better than that of the BP_PID controller, achieving the optimization effect.Moreover, the dynamic performance of TSA_BP_PID is slightly better than that of PSO_BP_PID.z Figure 6 indicates that, when subjected to an impulse disturbance at 1.5 s, the TSA_BP_PID controller achieved convergence within 0.2 s, which is similar to the convergence speed of the PSO_ BP_PID controller, but much faster than the other two controllers.In addition, the PSO_BP_PID, BP_PID, and PID controllers all exhibited some overshot, but the TSA_BP_PID controller did not.Overall, these results demonstrate that the TSA_BP_PID controller exhibits superior robustness.Figure 7 shows the comparison of fitness curves between PSO_BP_PID and TSA_BP_PID.The results show that the PSO algorithm approaches the minimum fitness value of about 0.0137 at around three iterations.Meanwhile, the TSA algorithm approaches the minimum fitness value of about 0.013 at around 27 iterations.Figure 8 displays the average execution time for each algorithm, with PSO taking 14.965701 seconds for 100 iterations and TSA taking 5.960677 seconds for 100 iterations.Figure 9 shows the optimized values of Kp, Ki, and Kd through PSO and TSA,with Kp=17.5,Ki=0.18,and Kd=9.4 after PSO optimization,and Kp=12.1,Ki=0.67,and Kd=3.6 after TSA optimization.We can conclude that although the TSA algorithm requires more iterations to achieve the minimum fitness value, it runs faster than the PSO algorithm.

VALIDATION OF UNMANNeD BICyCLe BALANCe
An experimental platform for an unmanned bicycle was constructed (Figure 10) to demonstrate the feasibility of the TSA_BP_PID control algorithm.The platform utilized Intel NUC12WSKi500 as the control component and was equipped with a ROS operating system based on Ubuntu 18.04.The TSA-BP-PID control algorithm was applied to send signals via a USB-CAN module to an Odrive driver board, which controlled the rotation of the inertial wheel motor to generate a force that countered the gravitational force acting on the bicycle.
The TSA-BP-PID controller was used to maintain the balance of an unmanned bicycle on an uneven surface while simulating the possible uncertainties, such as wind disturbances through object impacts during actual movements.Figure 11 compares the measured actual roll angle of the bicycle using the

DISCUSSION
In this study, we investigated the performance of a BP neural network PID controller optimized using the TSA in the simulation comparison and experimental platform validation.Our results indicate that the TSA-optimized BP neural network PID controller exhibited better robustness and dynamic performance than the unoptimized BP neural network PID controller and had smaller overshoot and settling time than the BP neural network PID controller optimized by using a particle swarm optimization (PSO) algorithm.This outcome is because the PSO algorithm has poor exploration ability in the search space, leading to the problem of being trapped in locally optimal solutions.Conversely, the search mechanism of the TSA provides a good balance between exploration and exploitation (Sharma et al., 2021).This results in better optimization performance of the BP neural network PID controller.

CONCLUSION
This study focused on the design of a balance controller for unmanned bicycles.First, we developed a mathematical model of an unmanned bicycle that we then analyzed based on a momentum wheel so we could derive a transfer function.A new type of controller that integrates the TSA with the BP neural network PID control was proposed to optimize the initial weights of a BP neural network.In simulation comparisons, the TSA-BP-PID controller exhibited superior dynamic performance and robustness compared to the other three controllers.Finally, we undertook experimental investigations with a physical prototype to demonstrate how the controller can quickly restore balance in uneven road surfaces and sudden impact disturbances.Uncertainty perturbations caused by environmental factors affecting unmanned bicycles' balance and stability were addressed.However, our proposed controller remains to have certain limitations.For example, we only optimized the initial weights of the BP neural network using the TSA algorithm.However, the BP neural network can also be optimized by adjusting the learning rate, increasing the network depth, and using regularization.These could be directions for future research.Future work could explore integrating SLAM navigation technology with unmanned bicycles to enable autonomous navigation while maintaining balance and stability across various terrains.In addition, this investigation could enhance research in practical applications, such as unmanned delivery and security patrols.

Figure 1 .
Figure 1.Side view of a bicycle

FigureFigure 6 .
Figure 4. Block diagram of the control system

Figure
Figure 7. Fitness curve

Figure 11 .
Figure 11.Roll angle during bicycle movement