Object detection or shape reconstruction of an object from images plays a vital role in computer vision, computer graphics, optics, optimization theory, statistics, and various fields. The goal of object detection is to empower the machines to locate and identify the items or things in the given image or video. An object forms the image in human eyes or on a camera sensor is defined by its shape, reflectance, and illumination. This overview covers the estimation of the object shape, reflectance, illumination, recent object detection algorithms and data sets used in recent research works. The classic photometric stereo is aimed to reconstruct the surface orientation from the known parameters of reflectance and illumination in multiple images. Object detection approaches are used to find various pertinent objects in a given single image and location of the objects. There are various datasets for objection detection, some of them are addressed here.
Top1. Introduction
Object detection collaborates with various computer vision techniques to develop applications like locating, tracking, counting objects or detecting outliers and anomalies in an environment. The overview represents determining the object shape, reflectance and illumination, recent object detection algorithms and data sets. Object detection methods are used to detect and locate various objects in a given image or a video. There are various datasets for objection detection have released, some of them are addressed in the following section. The image is formed by its intrinsic properties that are shape, reflectance and illumination, this is called as physics of image. An image is decomposed by computer vision into shape, reflectance, etc. as intrinsic properties of the image. Determining intrinsic components of an image is a tedious task. To estimate these properties, one or more components assumed to be known and the other component has to be determined, this is a general approach.
Generally, the input is an object’s images or segments under different kind of lighting conditions and the target is the three-dimension shape, for example converted into a set of surface normals. In traditional approaches, Lambertian materials are used for intrinsic images to constrain either reflectance (Barrow & Tenenbaum, 1978), or illumination under the environment which has controlled light (Horn & Brooks, 1989).
Figure 1. Basic diagram of Illumination model
The above picture (Figure 1) represents the basic Illumination model structure to estimate illumination, reflectance, and colour of surface.
Using deep learning method, we can calculate the reflectance and illumination of a given image by two ways. That single image represents a single-material specular object with natural illumination. First, the object’s reflectance map is estimated from the image, and then it is decomposed into the illumination and reflectance. In the step 2, Convolutional Neural Networks architecture is used for reconstructing the illumination maps and parameters of Phong reflectance from the reflectance map (Georgoulis et al., 2018). Another novel method uses Bayesian framework to estimate the shape and reflectance under known, uncontrolled illumination in the real-world scenes. Directional Statistics Bidirectional Reflectance Distribution Function model and Non-Parametric Illumination Model are used to retrieve the reflectance and shape (Lombardi & Nishino, 2015).
Shape Illumination and Reflectance from Shading (SIRFS) proposed to estimate the intrinsic properties (illumination, shape and reflectance) using shading. This approach consumes an image of a given object as an input and provides the output as shape estimation, reflectance estimation, illumination estimation, surface normal estimation, and shading estimation (Barron & Malik, 2015). Shape-from-shading (SFS) algorithm is used to estimate the shape with the natural illumination. This algorithm calculates an image’s surface normals of a diffuse object with constant albedo under known uncontrolled illumination (Johnson & Adelson, 2011).
Deep Convolutional Neural Fields method has proposed to solve the MAP inference and determine an image’s depth. Especially this method has proposed deep structured learning scheme to learn the strength of unary and pair wise continuous Conditional random field with deep CNN framework (Liu et al., 2015). There is a common and modern framework known as Deep Convolutional Neural network for estimating surface normal and value of depth from a given image. Conditional Random Fields (CRF) technique has accompanied with Deep CNN. Initially, this model designed for mapping the multi- scale image to surface normal or depth values. Then the pixel level’s super pixel level depth estimation is determined (Li et al., 2015).