Article Preview
Top1. Introduction
Face detection aims to determine existence of a face in a static image and to acquire the face’s accurate location and scope.
Video surveillance systems are very suitable for the physical security, since the video sequences from many remote areas can be presented to watchmen at a time. Face detection is interesting because it is usually an indispensable step of autonomous video surveillance system.
Face detection is an important research topic in the field of computer vision. The article (Hjelmås & Low, 2001; Yang & Kriegman & Ahuja, 2001; Zhang & Zhang, 2010) is the review of related literature. Scale and translation invariance are acquired by sliding a window over an input image at different resolutions. The research on face detection involves two issues: effective search strategy and robust classifier for face and non-face. The former affects the detection speed and the latter determines the detection accuracy which is the key issue discussed in this paper.
Face detection is a complex and challenging pattern-classification question, and its main difficulty has two aspects: (1) the face’s inner changes caused by: (a) face appearance are quite complex such as different face shape, different skin color, different expressions; (b) Face occlusion such as glasses, hair, head ornaments and other external objects, etc.; (2) the change caused by external conditions: (a) The face’s multi-pose caused by the different imaging angle such as the rotation of the plane, the rotation of the depth and the rotation of the upper and lower. (b) The illumination, such as brightness, contrast, and shadow, etc. (c) Imaging conditions such as the focal length of the camera, the imaging distance, and the way of the image acquisition, etc.
Face detection methods can be classified into two types. The first one is based on the idea that a face is an indivisible whole entity (Jianguo & Tieniu, 2000; Meynet & Popovici & Thiran, 2007). The second is part-based (Heisele & Serre & Poggio, 2007; Yang & Huang, 1994; Samaria & Young, 1994; Nefian & Hayes, 1998; Nefian & Hayes, 1999; Epshtein & Ullman, 2005), which can adapt to changes in posture and partial occlusion as compared with the former. Part-based methods mainly solve two problems: 1) extracting features from parts and establishing the mapping relationship between local features and face parts; 2) modeling dependencies among face parts.
The mosaic image method presented by Yang and Huang (1994) divided the original image evenly into rectangular cells and established gray distribution in face. The test image was filtered from a low resolution to a high resolution according to the distribution. The detection rate is not high for only a kind of gray feature is used in this method.
F.S. Samaria and Nefian et al. presented HMM-based face detection (Samaria & Young, 1994; Nefian & Hayes, 1998; Nefian & Hayes, 1999) and consider a face as composed of five parts: hair, forehead, eyes, nose, mouth. They divided the face into a series of rectangle-blocks. In (Samaria & Young, 1994), the gray features were extracted from the block, in (Nefian & Hayes, 1998) the DCT coefficients and in (Nefian & Hayes, 1999) the KLT coefficients. Above all, only one kind of feature was extracted. The mapping relationship between rectangle-blocks and face parts was acquired by multi-Gauss model. Due to diversity of faces, it is difficult to determine the number of Gaussian kernel. In addition, the dependencies between parts were modeled by a one-step transition probability matrix for the Markov assumption. However, this assumption is too strict for face detection. For only such a kind of dependency was modeled, local error can be transmitted to other regions.