Visual Tracking With Object Center Displacement and CenterNet

Visual Tracking With Object Center Displacement and CenterNet

Merouane Labeni, Chaouki Boufenar, Mokhtar Taffar
Copyright: © 2022 |Pages: 17
DOI: 10.4018/IJCVIP.290397
Article PDF Download
Open access articles are freely available for download

Abstract

Modern artificial intelligence systems have revolutionized approaches to scientific and technological challenges in a variety of fields, thus remarkable improvements in the quality of state-of-the-art computer vision and other techniques are observed; object tracking in video frames is a vital field of research that provides information about objects and their trajectories. This paper presents an object tracking method basing on optical flow generated between frames and a ConvNet method. Initially, optical center displacement is employed to detect possible the bounding box center of the tracked object. Then, CenterNet is used for object position correction. Given the initial set of points (i.e., bounding box) in first frame, the tracker tries to follow the motion of center of these points by looking at its direction of change in calculated optical flow with next frame, a correction mechanism takes place and waits for motions that surpass a correction threshold to launch position corrections.
Article Preview
Top

1 Introduction

Visual tracking is an important research area in computer vision which is critical for many applications including surveillance, trafðc monitoring, video indexing, human-machine interaction, and autonomous vehicle driving. In spite of existing trackers have achieved impressive progress in the last years, designing a robust tracker is still a challenging problem. In practice, the probabilistic approaches (Kristan et al., 2008) and (Pérez et al., 2002) that globally model the tracked object’s appearance, have demonstrated to be very successful. However, scenarios that contain signiðcant appearance changes caused by several factors that commonly occur in real-life scenarios, such as occlusion, scale variation, fast motion, deformation, and illumination variation present such models with serious problems. The reason is that such factors lead to reduced matches and drifting, which eventually result in the trackers’ defeat. Improvements in the visual model (Babenko et al., 2011), (Kalal et al., 2010), (Bolme et al., 2010), and (Grabner et al., 2006) potentially increases the trackers’ performance, but at the same time lead to additional questions regarding when should the visual model be improved and which parts of it should be appreciated. With the advent of high-performing object detection models (Ren et al., 2015) and (Zhou et al., 2019) a powerful alternative developed: Tracking-following-detection or tracking-by-detection (Zhou et al., 2020) and (Tang et al., 2017). Tracking-by-detection (or tracking-following-detection) influences the high power of deep-learning-based object detectors is actually the dominant tracking model. However the best object trackers are not without drawbacks.

This paper presents a method for the correction of object tracking coordinates basing on optical flow and a ConvNet method called CenterNet (Zhou et al., 2019), this last is based on the standard keypoint estimation method and stacked hourglasses network as its backbone network, like in (Law & Deng, 2018), which is trained on MS COCO datasets (Lin et al., 2014). An implementation of proposed technique has been performed using python programming language.

Figure 1 shows the general process of Object Center Displacement OCDTracker. The proposed technique consists on: Region of Interest selecting, Optical flow handling, slicing and tracking.

Figure 1.

OCDT general process

IJCVIP.290397.f01
Top

2 Background Information

2.1 Optical Flow

Optical flow is the image motion of objects as the objects, scene or camera moves between two consecutive images. It is a two dimensions vector field of within-image translation (Solem, 2012).

Consider a pixel in first frame (a new dimension, time, is added), it moves by distance in next frame taken after time (Mordvintsev & Abid, 2017):

IJCVIP.290397.m01
(1)

OpenCV contains several optical flow implementations, the authors then use method based on (Farnebäck, 2003). That is considered one of the best methods for obtaining dense flow fields (Solem, 2012).

2.2 CenterNet

CenterNet (Zhou et al., 2019) model represents objects by one point at their bounding box center. In this model, particularities such as object size, dimension, orientation, and pose are regressed directly from image features at the center position. Objects are detected with the standard key-point estimation method. Authors feed the input image to a FCNN that generates a heat-map. Peaks (i.e., local maxima) in this last correspond to object centers.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 13: 1 Issue (2023)
Volume 12: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 11: 4 Issues (2021)
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing