Article Preview
Top1 Introduction
Visual tracking is an important research area in computer vision which is critical for many applications including surveillance, trafðc monitoring, video indexing, human-machine interaction, and autonomous vehicle driving. In spite of existing trackers have achieved impressive progress in the last years, designing a robust tracker is still a challenging problem. In practice, the probabilistic approaches (Kristan et al., 2008) and (Pérez et al., 2002) that globally model the tracked object’s appearance, have demonstrated to be very successful. However, scenarios that contain signiðcant appearance changes caused by several factors that commonly occur in real-life scenarios, such as occlusion, scale variation, fast motion, deformation, and illumination variation present such models with serious problems. The reason is that such factors lead to reduced matches and drifting, which eventually result in the trackers’ defeat. Improvements in the visual model (Babenko et al., 2011), (Kalal et al., 2010), (Bolme et al., 2010), and (Grabner et al., 2006) potentially increases the trackers’ performance, but at the same time lead to additional questions regarding when should the visual model be improved and which parts of it should be appreciated. With the advent of high-performing object detection models (Ren et al., 2015) and (Zhou et al., 2019) a powerful alternative developed: Tracking-following-detection or tracking-by-detection (Zhou et al., 2020) and (Tang et al., 2017). Tracking-by-detection (or tracking-following-detection) influences the high power of deep-learning-based object detectors is actually the dominant tracking model. However the best object trackers are not without drawbacks.
This paper presents a method for the correction of object tracking coordinates basing on optical flow and a ConvNet method called CenterNet (Zhou et al., 2019), this last is based on the standard keypoint estimation method and stacked hourglasses network as its backbone network, like in (Law & Deng, 2018), which is trained on MS COCO datasets (Lin et al., 2014). An implementation of proposed technique has been performed using python programming language.
Figure 1 shows the general process of Object Center Displacement OCDTracker. The proposed technique consists on: Region of Interest selecting, Optical flow handling, slicing and tracking.
Top2.1 Optical Flow
Optical flow is the image motion of objects as the objects, scene or camera moves between two consecutive images. It is a two dimensions vector field of within-image translation (Solem, 2012).
Consider a pixel in first frame (a new dimension, time, is added), it moves by distance in next frame taken after time (Mordvintsev & Abid, 2017):
(1)OpenCV contains several optical flow implementations, the authors then use method based on (Farnebäck, 2003). That is considered one of the best methods for obtaining dense flow fields (Solem, 2012).
2.2 CenterNet
CenterNet (Zhou et al., 2019) model represents objects by one point at their bounding box center. In this model, particularities such as object size, dimension, orientation, and pose are regressed directly from image features at the center position. Objects are detected with the standard key-point estimation method. Authors feed the input image to a FCNN that generates a heat-map. Peaks (i.e., local maxima) in this last correspond to object centers.