Article Preview
TopIntroduction
Facial expression plays a very crucial role in human verbal and non-verbal communication. In human interactions, enunciation and perception facial expression built a communication channel that is auxiliary to voice and that carries prominent information about the mental, emotional and even physical states of the persons. Facial expression is one of the factors among pose, speech, behaviour and actions which are used in conveying information about the intensions and emotions of a human being. The level of accuracy of facial expressions in humans is very high in determining their behaviour and emotions. In recent years a lot of research has been made in facial expression recognition but still recognizing facial expression with a great perfection is not achieved due to the complexity and instability of expressions.
Facial expression recognition steps (FERS) is a three-step process which includes face detection, feature extraction and expression classification (De, Saha, & Pal, 2015). Yang’s survey (Yang & Waibel, 1996) states that the face detection techniques has four main types i.e., knowledge-based, feature invariant, template matching, appearance-based. Many algorithms have been proposed by the researchers in their respective research for face detection. One of the widely used face detection algorithm was proposed by Paul Viola and Michael J Jones in 2001. Viola and Jones method (Agarwal & Khatri, 2015; Viola & Jones, 2001) method proved to be a major milestone face detection method. Viola and Jones face detection model is based on three key unique features. First, they introduced concept of ‘integral image’ that allows features used by their detector that made the computation very fast. Second, they built their efficient and simple classifier using AdaBoost learning algorithm. Third, they developed a method which combines classifiers in a ‘cascade’ which allows background regions in the image to be quickly suppressed while contributing more computation time on detecting features i.e. nose, eyes, chin and mouth.
The feature extraction process aims at extracting specific data from an image or video that can produce relevant information. Accurate and effective recognition rate is only achieved when adequate features are extracted (Yacoob & Davis, 1996; Essa & Pentland 1997; Yeasin, Bullot, & Sharma, 2004; Hoey & Little, 2004). Geometric-feature based methods and appearance-based methods (Lai & Ko, 2014) are the two most commonly used approaches to extract facial feature extraction. Geometric feature-based methods yield comparable or better results than appearance-based approaches in AU identification (Valstar, Patras, & Pantic, 2005; Valstar & Pantic, 2006). However, the geometric feature-based methods majorly need correct and reliable facial feature detection and tracking crucial points in the face region, which is difficult to achieve in many situations. One of the most widely used geometrical feature-based method is Local Binary Patterns (LBP). The first Local Binary Patterns (Ojala, Pietikäinen, & Harwood, 1996) was proved a vigorous means of texture analysis.
LBP is majorly designed for texture analysis and texture description. It is mainly used because of its excellent light invariance property and low computational complexity. LBP operator works with a 3 X 3 pixel matrix where centre pixel which is surrounded by eight neighbours is used as threshold. Pixels surrounding the central pixel are marked as 1 if they have higher or equal gray value than centre pixel, otherwise marked as 0. A binary code is produced by concatenating marked values anticlockwise. Now the original LBP operator has a limitation that it could give good result because of its small 3 X 3 neighbourhood which is unable to capture required main features with large scale structure. So to enhance the original LBP operator, it is extended to a multi-resolution LBP operator which is denoted by
(Shan, Gong, & McOwan, 2009).