Article Preview
TopThe performance of the emotional scene detection is largely dependent on the accuracy and efficiency of the facial expression recognition. While there are various approaches to recognize facial expressions (Tian, Kanade & Cohn, 2011), our approach is based on the geometric features taking into consideration the tradeoff between the accuracy and efficiency. The geometric features can describe the shape and locations of salient facial components such as eyebrows, eyes, and a mouth.
For example, 3D models of the faces are used as the geometric features in order to accurately recognize facial expressions (Wang, Yin, Wei & Sun, 2006; Soyel & Demirel, 2007). They can be very precise and effective for accurate recognition. In the lifelog video retrieval, however, it will be difficult to prepare 3D facial features within reasonable cost because of a large amount of video data.
By using several salient facial feature points (e.g., the end points of a mouth) on a 2D coordinate system, the geometric features can be more concise. They are defined as the positional relationships of the facial feature points such as the distance between two points (Esau, Wetzel, Kleinjohann & Kleinjohann, 2007; Hupont, Cerezo & Baldassarri, 2010). We adopt the geometric features represented by the positional relationships of a few facial feature points because of the conciseness and the better understandability of the facial features.