Target Tracking Method for Transmission Line Moving Operation Based on Inspection Robot and Edge Computing

Aiming at the problems of low accuracy and high loss rate when the traditional target tracking (TT) method is applied to the TT of the moving operation of the transmission line, a transmission line based on an inspection robot and edge computing (EC) is proposed: the mobile job TT method. First, the basic framework of the TT algorithm is proposed, relying on the edge device to develop the TT system for mobile operations on the transmission line. Video information is collected by an intelligent inspection robot and sent to the target tracking system in the edge device for processing to obtain accurate data. Then, the gradient disappearance and explosion problems caused by the increase of network depth are solved by using the deep residual network. The traditional deep residual network is improved by introducing the improved bidirectional feature reinforcement network and the classification and regression subnet. The loss of position texture information is remedied, and the accurate tracking of the moving target on transmission line is realized. Finally, the real-time data acquisition of mobile operation target is realized by using an intelligent inspection robot, and the experimental verification is conducted. The proposed algorithm is compared and analyzed against the three other algorithms using the same data set through simulation experiments. The results show the precision, recall rate, accuracy, and comprehensive evaluation index F1 value of the proposed algorithm rank highest, reaching 93.8%, 90.2%, 83.8%, and 89.8%, respectively, compared with the other algorithms.


INTROdUCTION
Rapid development of intelligent algorithms, digitization, networking and intelligent management, and control of power transmission inspection have become a trend of industrial development (Bertrand et al., 2021;Liu et al., 2019;Qiong & Wang, 2019).The traditional transmission line management can no longer meet the needs of developing technologies.Intelligent equipment is needed to realize the storage, management, and analysis of line data and information (Ji et al., 2019;Markiewicz & Koperwas, 2022;Zhu et al., 2022).Implementation of a demand analysis as well as integration for the intelligent operation, inspection of transmission line moving operations, and design of a corresponding high-accuracy TT method are important for realizing a data-based maintenance and inspection of transmission lines.It is also a current focus of research in the industry (B.Chen et al., 2020;Kim & Choi, 2019;Lei et al., 2022).
The target tracking of moving operation on the transmission line refers to the non-fixed-point work on the transmission line.It is necessary to track the moving work target to achieve the intelligent inspection of moving operation of the transmission line (W.Chen et al., 2020;Song & Hua, 2022;Xin et al., 2022;).It is important to achieve digitization of power transmission, maintenance, inspection, acceptance, and intelligent management.EC and deep learning are currently the most forward-looking methods in the field of automatic detection and tracking of moving objects (Dayyala et al., 2022;Sharma & Kumar, 2020;Wu et al., 2019;).It can not only achieve high-precision mobile job TT (Ou et al., 2020;S. Wang et al., 2019) but also reduce the computing pressure for back-end equipment to ensure system stability and security (Song et al., 2021).In order to detect and track small targets in aerial images, Ou et al. (2020) proposed a method for detecting and tracking ground targets in unmanned aerial vehicle (UAV) images by improving the depth learning technology.This method can recover the tracking automatically after the target is lost, but its anti-noise performance is poor.Li et al. (2020) provides a method for multi-target detection and tracking.This method is based on data association and combined with the You Only Look Once version 3 (YOLOv3) basic model, which can effectively avoid the obstruction of occlusion problem in target tracking.However, the problem of target dislocation caused by large occlusion cannot be completely solved.Wang and Chang (2021) realized the accurate compensation of moving background based on symmetrical matching, adaptive outer point filtering, and gave the corresponding accurate target detection and tracking methods.Based on this method, a hexapod robot is designed to achieve dynamic target tracking.This method, however, only studies the applicability of the tracking algorithm in the field of mobile robots, and the tracking accuracy is low.Based on the adaptive Gaussian mixture model, Zhang et al. (2020) were able to detect human track and field moving targets using background subtraction.The corresponding target dynamic tracking method is given by introducing the deep learning algorithm.The method is only applicable to tracking large targets such as a human body in track and field, and the target loss rate in other fields is high.Guo et al. (2021) developed a new fusion information detection and tracking system by analyzing target detection, target tracking, and trajectory prediction.The system can detect and track the moving target and record its position information.This method, however, has a high target loss rate and cannot automatically recover the lost target.Based on the echo model of the tracking target, Tan et al. (2019) used the method of mathematical analysis to capture the fuzzy features of the small target.On this basis, a method for small TT is designed.However, the method does not consider the complexity of the deep feature mapping relationship, which causes tracking target loss.Automatic tracking is difficult with certain algorithms, so Yang et al. (2020) built an automatic TT system based on the INT8 quantization method by improving the kernel correlation filter with image analysis.This method relies on the real-time collection of video data from physical cameras, and its reliability needs improvement.Once the moving target operating on the overhead transmission line breaks down or functions incorrectly, it is likely to endanger the safe operation of the transmission line, thus affecting the normal operation of the line.This will not only lead to economic losses but also reduce the credibility of the power supply company.The moving target on the overhead transmission line has fixed track characteristics but high accuracy requirements.
Based on the above analysis, to solve the problems of high loss rate and low accuracy in traditional TT methods, a TT method for moving operations of transmission line based on EC and deep residual learning is proposed.The basic ideas are 1) based on edge computing, a basic framework of tracking algorithm for moving target of transmission line is proposed; and 2) an improved bidirectional feature enhancement network and a classification and regression subnet into the traditional deep residual network are introduced to optimize the target tracking algorithm.Compared with the traditional moving TT algorithm, the innovation of the proposed method lies in the following: 1.The accuracy and reliability of data collection are improved by using an intelligent inspection robot to collect real-time data of mobile operation targets.2. The target tracking system is developed with edge devices to share the cloud computing pressure and to reduce the network delay.The introduction of the depth residual network model solves the problem of gradient disappearance and gradient explosion with the increase of network depth.3. The depth residual network is improved by introducing bidirectional feature enhancement network and classification regression subnet, which solves the loss of position texture information after multi-layer convolution and down sampling, improving the accuracy of the algorithm.

TRACKING SySTEM FOR MOVING TARGET OF POwER TRANSMISSION LINE
Edge Computing places tasks, shares the pressure of cloud computing, overcomes cloud load, and cooperates with cloud computing to provide various types of intelligent services.Compared with the traditional method of uploading data to the cloud computing platform, EC has is characterized by high security, zero network delay, good real-time performance, zero broadband application, and a strong self-healing ability.It can also handle privacy issues.The latency in EC is less than that in and cloud computing, including mobile cloud computing, without considering the computing upper limit (Li & Huang, 2019).
Using the network environment of EC, information transfer and invocation between nodes and the effect of parallel tracking of nodes can be realized.In the parallel tracking process the target position is determined according to the response values of multiple position nodes, which makes the selection of the TT position more reliable.Embedding the algorithm into the tracking process has the following two requirements: 1) time complexity-the parallel processing of the main node and the generation node should not affect the real-time tracking and 2) space complexity-the principle of reasonability is followed, so that the space complexity of the algorithm does not increase excessively.
This study addresses the tracking problem of moving targets on transmission lines to avoid security risks, network congestion, bandwidth consumption, difficulty in troubleshooting, and server failures or erroneous data generation, using EC to comply with the moving target on the transmission line tracking.By adding edge devices to the front end, the mobile TT system on the transmission line is developed based on edge devices.Video information is collected by an intelligent inspection robot and sent to the target tracking system in the edge device for processing.Data is then obtained and transmitted to the management center in the computer room.The management center then sends the video stream to the GPU server to complete pedestrian detection, tracking, and passenger flow data statistics.The GPU server sends the processed passenger flow data to the storage server.At the same time, the management center will extract the processed passenger flow data and send it to the cloud computing center for data sharing.
The edge device uses the GPU embedded development platform, which not only meets the requirements of system computing power but also processes multiple channels of information at the same time, reducing equipment costs and improving detection efficiency.The mobile operation TT system on the transmission line under the EC mode processes the data at the front end, which not only reduces the computing pressure for the back-end equipment but also reduces the number of interactions between the devices, ensuring the stability and security of the system.Its architecture is shown in Figure 1.
The basic structure of the intelligent inspection robot shown in Figure 1 is broken down in Figure 2. The robot adopts the hybrid differentiator integrated data collector to ensure the accuracy of data acquisition (J.Li et al., 2019).The automatic line inspection robot is equipped with an identifier, which combined with the Kirin chip, ensures the communication speed between the device and the server.The effective identification range reaches 100-1000 m, the power consumption is less than 0.77 W, the azimuth accuracy error is ± 0.01, the identification time is less than 2 seconds, the device's working harmonic frequency is 3.56 MHz, the device's storage space is 64 kB flash mode, the barrier free working time is at least one year, and the device's ambient noise is less than 30 dB.In order to ensure the working efficiency of the recognizer, the device uses wireless communication to complete the target location and identification, which improves the accuracy and reliability of data acquisition.

TT ALGORITHM FOR MOVING OPERATION OF TRANSMISSION LINE
The computing power of edge devices is far less than that of back-end servers, so the model of TT algorithms based on EC needs to be streamlined and lighter.At the same time, its detection accuracy should also meet the application requirements, and the structural parameters and calculation amount should not exceed the capabilities of edge devices, so as to ensure the real-time detection without reducing the detection accuracy (J.Wang et al., 2019).
The TT algorithm needs to associate each frame according to the detected target in each frame of the image and match the trajectory to the target.For the newly generated target, a tracking trajectory needs to be given.The algorithm needs to terminate the tracking trajectory of the target that has left the detection area.
Traditional algorithms generally distinguish the foreground and background of the target, in which the target to be tracked is the foreground, and the other parts are the background.By transforming the tracking problem into a binary classification problem, the entire tracking algorithm is divided into two parts: 1) feature extraction and classifier and 2) tracking once the target is detected.However, the traditional TT algorithm increases the amount of computation, which is not suitable for edge devices.The algorithm based on deep learning is considered in this case, and the target feature extraction can directly use the results detected by the previous target.

deep Residual Networks
In the deep learning network, the image features have undergone multiple linear or non-linear comprehensive operations, and the deeper the network, the stronger the representation ability.Therefore, the depth of the neural network affects the strength of the learned representation ability.The deeper the network is, the better it can process images.Conventional convolutional neural network (CNN) models often use plain structures to superimpose and increase the depth of the network.As the number of model layers increases, the demand for the number of samples also increases.The performance of the network, however, tends to remain unchanged or even decrease as the number of layers increases, and the problem of performance degradation may occur.For example, in the CNN model, when the network reaches a certain depth and the number of layers of the model increases, the classification performance does not improve, the convergence speed of the network gets slower, the accuracy of the classification becomes worse, and gradients appear.The reason for such problems is that the gradient gradually disappears during the propagation process of the deep network, which makes it impossible to adjust the weights of previous layers.
The proposed deep residual network (ResNet) model solves the problem of gradient disappearance as the network depth increases.The model reduces the difficulty of network fitting by adding a residual network to the CNN and reduces the overall calculation amount.The residual unit is generally used to solve the degradation problem of the deep network, and its schematic diagram is shown in Figure 3.A residual unit is also directly skip-connected from the input to the output.The importance of directly transferring shallow information to deep layers is that such a network with skip connections is easier to optimize than a network without.
In Figure 3, x represents the input of ResNet.F x x ( ) + represents the output of ResNet, and F x ( ) represents the multiplication and the addition of data in the network.If the output of the optimal fitting result of the convolutional network is G x F x x ( ) = ( ) + , then optimal F x ( ) is the residual between G x ( ) and x , and the network effect is optimized by fitting the residual.In the residual network, it is easier to learn F x ( ) to 0, which can be achieved by L2 regularization.
Therefore, the identity mapping can be obtained only by redundant blocks, and the network performance will not be degraded.The final residual network model is obtained by concatenating multiple residual modules in series.
The core idea of the residual network is to add an identity mapping structure, which adds a parallel skip connection to the stacked weight layers (Lu et al., 2021).This network structure can avoid the difficulty in returning the error of the deep learning network with the deepening.In theory, residual modules can be stacked infinitely without degrading the performance of the network.Practically, however, when the network depth exceeds 50, the unit block of the residual network will adopt the bottleneck structure, that is, the residual network-50.Figure 4 shows its structure.
The bottleneck structure consists of three convolutional layers, where the middle is a convolutional layer of size 3 x 3, and the upper and lower are convolutional layers of size 1 x 1.Such a structure can reduce the size of the feature map.Using the bottleneck structure can improve the training efficiency.The deep residual network effectively solves the problem of gradient disappearance or explosion with the increase of network depth.In addition, the deep residual network also improves the network performance as the network deepens.With such a residual structure, the computational load of the network can also be reduced.In addition, residual network-50 has a higher convergence speed and superior performance than conventional stacked networks during model training.

Improved Method of deep Residual Networks
In the deep network, shallow features have clearer position and texture information of objects.But after multi-layer convolution and down sampling, the position and texture information are lost.To consider the accuracy and running speed of the proposed method, the traditional residual network is optimized by introducing an improved bidirectional feature enhancement network and a classification and regression subnet.

Improved Bidirectional Feature Reinforcement Network
The structure diagram of the improved bidirectional feature enhancement network is shown in Figure 5.
The bidirectional feature augmentation network still uses the extracted feature pyramid, and the feature maps of different resolutions are augmented by bottom-up and top-down paths.The number of channels at each pyramid level is the same after performing the stacking operation and the convolution operation, but the feature maps at each level are enhanced.design is to perform 1 x 1 convolution on C3 to obtain P3, and to perform 1 x 1 convolution on C4, and to concatenate the obtained output with the features of P3 after twice down sampling to obtain P4.Concatenate output features are obtained by performing 1 x 1 convolution on C5, features P4 are obtained by quadruple down sampling of P3, and features are obtained by twice down sampling of P4 to obtain P5.
In order to reduce noise interference as much as possible, a convolution kernel of size 3 x 3 is used to perform sliding with the step size 2 on the deeper feature map P5, and P6 is calculated by traversing the entire image.The activation unit and 3 x 3 convolution kernel is used to perform convolution calculation with stride 2 on P6 to obtain P7.So far the improved bidirectional feature enhancement network has processed the feature maps of all pyramid levels.The sizes of P3, P4, P5, P6, and P7 are 1/8, 1/16, 1/32, 1/64, and 1/128 of the original image, respectively.

Classification and Regression Subnet
The feature maps of five different scales (P3, P4, P5, P6, and P7) obtained by the improved bidirectional feature augmentation network are the inputs of the two task-specific subnetworks.The classification subnet uses the classifier to calculate the category and classification score of the candidate target.The regression subnet is responsible for making the candidate boxes as close as possible to the ground truth through the regression process of translation and scaling transformations.
To achieve the target in densely covered images, anchor boxes with sizes of 5,122; 2,562; 1,282; 642; and 322 are set at various levels of the feature pyramid.Anchor boxes have three different aspect ratios: 1:2, 1:1, and 2:1.Three different sizes of 1, 21/3, and 22/3 are set for each aspect ratio, and there are nine different anchor boxes for each layer.The size range that can be covered by all levels of the pyramid is from 32 to 813 pixels.
Each layer in the classification subnet shares parameters, which can be used to predict the probability of the C target categories belonging to the M anchor boxes.Four layers of fully convolutional networks of size 3 x 3 x 256 are connected to the output of each layer in the pyramid, and the last layer uses a convolution of size 3 x 3 x MC to convert the output dimension to MC.It indicates that each anchor box corresponds to an M-dimensional vector, and the vector stores the probability of belonging to each category.The category with the highest probability score is denoted as 1, and the rest as 0 in the classification task.The following formula (1) is the focal loss function, and its role is to avoid sample imbalance: In formula (1), p 0 represents the predicted probability value of the output; β represents the weight factor, and its role is to suppress the imbalance of the number of positive and negative samples; λ represents a parameter whose role is to control the imbalance in the number of easy-hard samples.
This process is also implemented using a fully convolutional network to predict the offset of each anchor box from its corresponding ground-truth box location.After the output of each layer in the pyramid, four layers of fully convolutional networks of size 3 x 3 x 256 are connected, and the last layer uses a convolution of size 3 x 3 x 4C to convert the output dimension to MC. Class classification and location regression tasks are accomplished using two subnetworks.There are many prediction boxes with different scores near an object, but in the end, only one box with the highest-class confidence score can be retained as the detection result.A non-maximum suppression operation is used to remove surrounding redundant boxes, ensuring that the highest quality detection boxes are obtained.
The structural framework of the improved deep residual network suitable for the detection and tracking of moving targets in transmission lines is shown in Figure 6.

EXPERIMENT ANALySIS Simulation Environment and Parameter Settings
The specific hardware and software simulation environment is shown in Table 1.

datasets and Evaluation Metrics
In order to evaluate the performance of the proposed tracking algorithm more accurately, an automatic line inspection robot is used to track and photograph the operation process of the moving operation target of the transmission line in real time.The robot collected the image data of 1,000 800 × 800 pixel transmission line moving objects.The annotation tool is used to manually annotate all images and save the position and size information of the moving target in the image.The experiment is conducted by randomly dividing into three parts the saved image and the corresponding image label information, image name, storage path, target  2.
The indicators include the precision (P) of target detection, the accuracy (A), the recall rate (R), and the value of the comprehensive evaluation index F1.The F1 score is the harmonic average of P and R, which can better reflect the overall performance of the model.Its maximum value is 1 and its minimum value is 0. Their calculations are shown in the following formulas (2), (3), and ( 4 In the above formulas, T P represents the number of pixels in the tracking target change category with correct results, F P represents the number of non-change pixels of the tracking target with correct

Simulation Analysis
The data set in Table 2 is used to train the TT network model for transmission line moving operations based on EC and the deep residual learning proposed.The variation curve of the obtained training loss with the number of iterations is shown in Figure 7.In Figure 7, the proposed model has large fluctuations in the initial loss during training on the validation set.This is because the network training is not sufficient, which makes it difficult to distinguish between true and false samples, and the loss value fluctuates.As the number of iterations increases, the loss curve gradually becomes stable, indicating that the network is continuously converging.In earlier training on the training set, the loss does not change significantly, and the loss curve is stable.The overall performance is improved.The training process of the test data set is always stable, and the final parameter accuracy of the test network model is superior.
The variation curve of the overall accuracy of the proposed transmission line moving TT algorithm with the number of iterations is shown in Figure 8.
Figure 8 shows when training the proposed algorithm model, the accuracy of the validation and training sets in the early stage of training both increased rapidly from low to high and rose to a stable value after the 20 th iteration.This demonstrates that the generalization ability of the proposed algorithm is good.After the 20 th iteration, the accuracy of moving target detection and tracking tends to be stable or fluctuates around the stable value, and the overall accuracy reaches the maximum value.At this time, the network model has converged.In the process of training the test set, the overall accuracy has reached a stable value and remains stable after 15 iterations, which indicates that the trained model has a good effect, and the relevant parameters are accurate.
To prove the advantages of the proposed method more definitively, it is necessary to compare the method with others.The best methods for comparison are the TT method built by Li et al., (2020) based on the classic YOLOv3 model; the TT method proposed by Wang & Chang, (2021)  In Figure 9 and Figure 10, the proposed mobile job TT algorithm has the highest precision, recall, accuracy, and comprehensive evaluation index F1 compared with the other three algorithms, reaching 93.8%, 90.2%, 83.8%, and 89.8% to the respective category.Compared with the maximum value among the other three methods, the proposed method is superior by 4.41%, 5.06%, 9.15%, and 8.33%.The results show that the proposed tracking method for moving targets of transmission lines can better detect and track targets.This is because the network environment of EC can realize the transfer and invocation of information between nodes and the effect of parallel tracking of nodes.The target position can be determined according to the response values of multiple position nodes, so that the selection of the TT position is more reliable.The introduction of the improved bidirectional feature enhancement network and the classification and regression subnet remedies the problem of loss of location and texture information in the deep residual network.Through the regression process of translation and scaling transformation, the candidate frame is as close as possible to the real target, which improves the accuracy of target detection and tracking.

CONCLUSION
Aiming at the problems of low accuracy and high tracking loss rate of traditional TT methods, a TT method for moving transmission lines based on EC and deep residual learning is proposed and verified by simulation experiments.The results show that the development of the tracking system for moving targets on the transmission line using edge devices can effectively reduce network delay and the amount of computation.Using the deep residual network as the basic model can effectively solve the gradient disappearance and explosion caused by the increase of the network depth.The introduction of bidirectional feature enhancement network and classification and regression subnet can effectively avoid the loss of target location information and improve the accuracy of target detection and tracking.Future work needs to conduct in-depth research on problems such as motion blur in long-term TT tasks, multi-TT problems, and TT under severe occlusion.

Figure 2 .
Figure 2. Basic structure of the intelligent inspection robot

•Figure 5 .
Figure 5. Structure diagram of the improved bidirectional feature enhancement network

Figure 6 .
Figure 6.Structure diagram of improved deep residual network

Figure 7 .
Figure 7. Training loss curve of the proposed method: EC

Figure 8 .
Figure 8. Training loss curve of the proposed method: TT algorithm

Table 2 . Simulation experiment dataset Name Number of Images Number of Targets
T N represents the number of pixels in the tracking target change category with incorrect results, and F N represents the number of non-change pixels of the tracking target with incorrect results.