Incorporation of Depth in Two Dimensional Video Captures: Review of Current Trends and Techniques

Incorporation of Depth in Two Dimensional Video Captures: Review of Current Trends and Techniques

Manami Barthakur (Gauhati University, India) and Kandarpa Kumar Sarma (Gauhati University, India)
DOI: 10.4018/978-1-4666-9685-3.ch004
OnDemand PDF Download:
No Current Special Offers


Stereoscopic vision in cameras is an interesting field of study. This type of vision is important in incorporation of depth in video images which is needed for the ability to measure distances of the object from the camera properly i.e. conversion of two dimensional video image into three dimensional video. In this chapter, some of the basic theoretical aspects of the methods for estimating depth in 2D video and the current state of research have been discussed. These methods are frequently used in the algorithms for estimating depth in the 2D to 3D video techniques. Some of the recent algorithms for incorporation depth in 2D video are also discussed and from the literature review a simple and generic system for incorporation depth in 2D video is presented.
Chapter Preview

1. Introduction

Study of stereoscopic vision is receiving wide spread attention. It is primarily due to the fact that it is likely to generate bio-inspired vision capability (Knorr, Smolic & Sikora, 2007). Developments and innovations in stereoscopic vision is an important aspect for creation of artificial vision since it incorporates the third dimension of depth in video images. Conversion of two dimensional (2 D) video images into three dimensional (3 D) form mainly deals with the ability to properly measure and incorporate the depth component. It is related to giving due importance to the distances of the object from the camera while deriving decision regarding object and using the derived inference for some process control. The third dimension of depth can be perceived by the human vision in the form of binocular disparity. Human eyes are located at slightly different positions. The eyes perceive different views from its surroundings and the brain then reconstruct the depth information from these different views. Stereoscopic vision takes advantage of this phenomenon. Two slightly different images of every scene are used and the points in one image are matched with their corresponding points in the other image. Then the disparity i.e. the amount of shift that the corresponding points in the two images is calculated. Now, disparity is inversely proportional to the depth. Therefore, higher the disparity, smaller is the depth and closure is the point to the camera (Wei, 2005). Thus a 3D video can be realized from a 2D video with an appropriate disparity and calibration of parameters.

Three-dimensional video may be the next step in the evolution of motion picture formats. Interest towards 3D video has led to the development of industries to fabricate products like TV, mobile, monitor and various display devices which are capable of displaying 3D images. Common 3D video has several applications in robotics, entertainment world, and in surveillance. A 3D image of a person's face might be used in biometrics instead of fingerprint, iris, face, voice and DNA recognition techniques for identity management and in security. The same technique might also be applied for analysis of security footage from closed-circuit television cameras (CCTV) in crime investigation or in searching for missing persons.

Generation of 3D content is an important step for generation of 3D videos or images. There are several special cameras which have been designed for direct generation of 3D models. A stereoscopic dual camera is such type of camera. In this camera, two separate monoscopic camera in a co-planar configuration is used. Each of the two cameras captures images for each of the eyes. Then using binocular disparity, depth information is achieved. Another example of camera for direct generation of 3D models is depth-range camera which consists of a laser element. The camera captures a normal 2D image and its corresponding depth map is generated. The depth map is a 2D function that gives the depth of an object obtained from the camera as function of image coordinates in the form a grey level image with its pixel value representing the depth. The laser element in the camera is used for the construction of depth map. It emits light towards the object which has been captured by the camera. After hitting the object, the laser light is reflected back and it is subsequently registered for construction of depth map (Wei, 2005). An example is the system designed by Mitsubishi Electrical Research Laboratories (MERL) (Matusik & Pfister, 2004) where the framework had used 16 cameras at different view points, and obtains the 3D video data directly.

Though all the techniques described above, contribute to the prevalence of 3D-TV, but it can be rather expensive and difficult to set up. Moreover, a user may not have an interest in viewing content in 3D that was only captured by one camera. In that case it would be impossible to fill the user’s need. On the other hand, there are huge amount of current and past media data in 2D format. These data should be possible to be viewed with a stereoscopic effect. Because of these reasons the 2D to 3D conversion methods are needed. Further, with stereoscopic vision, real time surveillance shall be more reliable. It shall also be an effective addition to process control and in robotic vision. In most case, the video camera’s limitation in its 2D presentation can be rectified and the feeds made more effective with the incorporation of 3D vision.

Complete Chapter List

Search this Book: