In computer vision, 3D modeling refers to the process of developing 3D representation of the real world objects with systematic procedure. The 3D models can be built based on geometric information about the object or scene to be modeled using CAD/CAM software. However, this approach needs prior knowledge of the objects in the scene like dimension, size of objects, distance from the object to camera, et cetera. To make the 3D models more photo realistic and convenient, images of the objects can be used to build the 3D models. In this chapter, the authors propose a method to extract 3D model from single view perspective image. The approach is based on edge length and exploiting symmetric objects in the scene. Later, an application of touring into picture is discussed with the proposed method.
TopIntroduction
There are mainly two reasons for the diversion from stereovision – 3D modelling with multiple images, to monocular vision – 3D modelling with a single image. Firstly, it allowed researchers to clearly understand the importance of monocular cues and how useful it would be when combined with binocular cues. 3D reconstruction will be more visually pleasing when monocular cues are combined with binocular cues. Secondly, it allowed researches to elucidate what sorts of monocular cues are useful for depth perception. Monocular cues are interesting and important. Further, monocular cameras are cheaper, and their installation is less complex then stereo cameras. Using single view images, reconstruction of 3D works well even at larger distances. But in stereo vision, the accuracy is limited by the baseline distance between the two cameras. When the distance between the cameras becomes large, surfaces in the images exhibit, different degrees of occlusion, large disparities, etc, all of which makes it more difficult for a computer to accurately determine the depth of the scene. Due to all these reasons, recent work on 3D reconstruction is done mainly using single view images. It is called as Single View Modeling (SVM).
SVM refers to building three dimensional models from single image. It is inferred from the literature (Seitz, 2001, Criminisi, 1999, & Debevec, 1996) that 3D reconstruction from a single image must necessarily be through an interactive process in which the user provides information about the scene structure. Such information may be in terms of vanishing points or vanishing lines, co-planarity, spatial inter-relationship of features, surface normal, and camera parameters. Some of the traditional approaches based on shape, shading and texture have complicated user interaction in terms of specifying the inputs.
Recent works deal with various kinds of 3D modeling methods – a little user interactivity is effective in reconstructing a 3D model (Seitz, 2001) were high quality results were obtained on images with limited perspective distortion but only visible surfaces in an image could be modelled in the 3D thus leading to holes near the occluded boundaries. Another algorithm was introduced later by Feng Han which reconstructed 3D shapes and scenes of an object with prior experience or knowledge using Bayesian reconstruction (Han, 2003). Derek Hoiem later proposed a fully automatic method for creating virtual walkthroughs from a single photograph. Though the algorithm proposed did not work on every single image, surprising results were obtained on a wide range of images (Hoiem, Derek, 2005). The approach proposed by Tal Hassner was interesting as 3D reconstruction was done with the help of a database of 2D images. Hassner’s approach also provided accurate results but did not do well on unstructured objects (hands). A large set of probable images (Hassner, & Basri, 2006) were stored in the database with their depth maps. The input image is compared with the images in the database and the most probable match is selected and the probable depth is estimated. Another reconstruction technique was proposed recently by A. Saxena were the Markov Random Field (MRF) algorithm was used given only the 2D image as input. No particular assumptions were made in this approach which was beneficiary (Saxena, & Ng, 2007). This approach created 3D models which were visually pleasing to the user’s eye.