This chapter emphasizes on the approach to include information from different type of sensors into the visible domain real time tracking. Since any individual sensor is not able to retrieve the complete information, so it is better to use information from distinct category of sensors. The chapter firstly enlightens the significance of introducing the cross-domain treatment into video based tracking. Following this, some previous work in the literature related to this idea is briefed. The chapter introduces the categorization of the cross-domain activity usage for real time object tracking and then each category is separately discussed in detail. The advantages as well as the limitations of each type of supplemented cross domain activity will be discussed. Finally, the recommendation and concluding remarks from the authors in lieu of future development of this cutting-edge field will be presented.
TopIntroduction
Tracking in the simplest form can be defined as the problem of estimating the trajectory of an object as it moves around a scene. Earlier ways of tracking an object in a video sequence included application of an intelligent tracking algorithm on the image frame information contained in the video sequence. This technique worked well for some time, but a new research field evolved due to handicapping of the visual information in locating the object individually under certain conditions like low vision, background similarity, occlusions, limited range of view etc. This new practice intruded data and information from different modes of information and merged the new information to obtain a robust tracking estimate in the visible domain itself. Addition of the different sensor’s information to camera information is not a simple process since it involves different complex stages such as calibration of data for cross domain usage, appointment of a suitable fusion algorithm that is handy for decision making, recovering the missing information from individual domains and cancellation of mutual information from sensors. Because of numerous advantages and proven results of efficient tracking by the use of cross domain intruded tracking, it is need of the hour to be aware of this upcoming technology. Also, there is an urgent need for intelligent video systems to replace human operators to monitor the areas under surveillance. Applications of a Robust Object tracking includes: Motion-based recognition, Automated surveillance, Video indexing, Human-computer interaction, Traffic monitoring, Vehicle navigation and various controlling applications including Accident Avoidance, Automatic Guidance, Head Tracking for Video Conferencing etc. Due to loss of information from a unique domain only, few challenges that emerge for a healthy tracking include occlusion, shape deformation of object, changes in lighting conditions, shadow effect, illumination variations, real-time processing requirements, loss of information caused by projection of 3D world on 2D image etc. Since generally an event or activity consists of rich multimodal information, one can use information from different modes themselves. The chapter will aim at covering the role of following domains mingled with the visible imagery to obtain an efficient real time tracking of objects:
- 1.
Usage of Thermal Imagery in Visible video tracking
- 2.
Usage of Audio information in visible video tracking
- 3.
Usage of RADAR’s information in visible video tracking
- 4.
Usage of LIDAR’s information in visible video tracking.
TopBackground
Due to limitations occurring in tracking using visible information only, various techniques started to evolve that complement the video tracking procedure using help of secondary information from distinct modes.
To enhance the vision capabilities, earlier multi vision and stereo tracking was made use. (Bakhtari, Naish, Eskandari, Croft, & Benhabib, 2006) carried an experiment to increase the quality of a surveillance system using 1 static overhead camera and 4 mobile cameras. But here only a single target’s position and orientation were surveyed in this basic research. A method to carry out multiple people detection and tracking using stereo vision was presented by (Muñoz-Salinas, García-Silvente, & Carnicer, 2008). In this method depth, color and gradient information was used to track people in complex situations using a multiple particle filter algorithm. Each particle filter was corresponded to an individual target. Although the approach was able to perform well in conditions like occlusions, peoples jumping and running, shaking hands and swapping their positions etc., but could not provide a solution related to disguised vision conditions. For increasing the vision capabilities of a single camera, a multi camera technique was proposed by (Lin & Huang, 2011) in which the field of view (FOV) for object detection and tracking was enhanced by the concept of overlapping and non-overlapping FOV of distributed cameras. Kalman filter was used for tracking of objects and a homography mapping technique was used to provide a unique continuous tracking result. Though various problems of the single visible sensor track were tried to remedies through incorporation of the multiple camera views, which made the tracking more accurate, but still there was scope of improvements for the case where the information in the visible light form is itself inefficient to properly characterize the scene and the object.