Object Detection and Tracking in Real Time Videos

Object Detection and Tracking in Real Time Videos

Christian R. Llano (Department of Industrial Engineering, University of Miami, Coral Gables, USA), Yuan Ren (Department of Industrial Engineering, Shanghai Dianji University, Shanghai Shi, China) and Nazrul I. Shaikh (Department of Industrial Engineering, University of Miami, Coral Gables, USA)
Copyright: © 2019 |Pages: 17
DOI: 10.4018/IJISSS.2019040101
OnDemand PDF Download:
No Current Special Offers


Object and human tracking in streaming videos are one of the most challenging problems in vision computing. In this article, we review some relevant machine learning algorithms and techniques for human identification and tracking in videos. We provide details on metrics and methods used in the computer vision literature for monitoring and propose a state-space representation of the object tracking problem. A proof of concept implementation of the state-space based object tracking using particle filters is presented as well. The proposed approach enables tracking objects/humans in a video, including foreground/background separation for object movement detection.
Article Preview

1. Introduction

It is arguable that one of the most important advances in modern computer sciences is related to the development of computer vision and visual object tracking (VOT) techniques (Szeliski, 2011). The ever-increasing storage capacity of processors, servers, and cloud solutions make possible to read, store and analyze video data in almost all activities and environments. According to some online sources, one of the major video-sharing websites maintains around 1,325 million of users, 300 hours of video uploaded every minute, and 4,950 million of videos1, whereas more than ninety percent of mobile viewers share video and fifty-two percent of mobile traffic is a search for video2.

In video data analysis, VOT concerns with sequentially localizing one or more objects in real–time by exploiting information from imaging devices through fast, model-based computer vision and image– understanding techniques (Panin, 2011). Advances in areas like artificial intelligence and machine learning have brought techniques to perform a comparable job in relation to human-eye (Lu and Tang, 2015). However, processing time keep the interest of researchers in the pursuit of efficient and reliable models for recognition and tracking in video.

VOT applications are ubiquitous and not only related to human recognition. Some significant applications are presented in surveillance and security systems, traffic monitoring, in-vehicle tracking it may be used for automobile tracking, medical diagnosis systems (Kim et al., 2010). For this paper and future work, we define a problem on the basis of video surveillance of humans in outdoor environments. It constitutes the framework of some of the issues we are willing to face, but also as the start point for the basic concepts in computer vision. Throughout this work, we consider cameras as the source of video information.

A video is supposed a sequence of images or frames which in turn are considered as the basic element of the data-video information. We consider as preexistent objects the natural characteristics and the environment surrounding a given frame. We assume an out-door surveillance area well defined if at the current frame all the preexistent objects are identified. Sometimes, in human tracking in the video are also predefined a small group of individuals known as authorized in the surveillance area.

A typical human surveillance VOT problem route consists of (a) identifying the movement of a person or crowds; (b) discriminating humans from surrounding environment, recognize and exclude possible interference; and (c) following the path of the identified objectives during the surveillance time. Of this, task (a) is known in the literature as the recognition task. It involves searching for features of the objective (person or group of individuals) and the distinction of them from the surrounding environment. In this stage, it is crucial a priori input knowledge (information) about the objective as well as a training process or learning technique. Also, human control of the classification process is required, for instance, for labeling the individuals to be distinguished during the surveillance. The process is usually iterated up to the point the algorithm can identify and obtain feedback by itself. This process uses the aspects of the rising learning machine methods.

As a multiple task problem, it demands many actions to be performed successively and simultaneously. In concrete, our task consists of:

  • Identifying if any human or group of people in the area;

  • Identifying the number of persons;

  • Identifying intruders, i.e., not allowed individuals;

  • Identifying people activities;

  • Identifying unusual people activity.

Complete Article List

Search this Journal:
Open Access Articles
Volume 14: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing