An Emotional Scene Retrieval Framework for Lifelog Videos Using Ensemble Clustering

An Emotional Scene Retrieval Framework for Lifelog Videos Using Ensemble Clustering

Hiroki Nomiya (Graduate School of Information Science, Kyoto Institute of Technology, Kyoto, Japan), Atsushi Morikuni (Department of Information Science, Kyoto Institute of Technology, Kyoto, Japan) and Teruhisa Hochin (Graduate School of Information Science, Kyoto Institute of Technology, Kyoto, Japan)
Copyright: © 2015 |Pages: 13
DOI: 10.4018/IJSI.2015070101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

A lifelog video retrieval framework is proposed for the better utilization of a large amount of lifelog video data. The proposed method retrieves emotional scenes such as the scenes in which a person in the video is smiling, considering that a certain important event could happen in most of emotional scenes. The emotional scene is detected on the basis of facial expression recognition using a wide variety of facial features. The authors adopt an unsupervised learning approach called ensemble clustering in order to recognize the facial expressions because supervised learning approaches require sufficient training data, which make it quite troublesome to apply to large-scale video databases. The retrieval performance of the proposed method is evaluated by means of an emotional scene detection experiment from the viewpoints of accuracy and efficiency. In addition, a prototype retrieval system is implemented based on the proposed emotional scene detection method.
Article Preview

The performance of the emotional scene detection is largely dependent on the accuracy and efficiency of the facial expression recognition. While there are various approaches to recognize facial expressions (Tian, Kanade & Cohn, 2011), our approach is based on the geometric features taking into consideration the tradeoff between the accuracy and efficiency. The geometric features can describe the shape and locations of salient facial components such as eyebrows, eyes, and a mouth.

For example, 3D models of the faces are used as the geometric features in order to accurately recognize facial expressions (Wang, Yin, Wei & Sun, 2006; Soyel & Demirel, 2007). They can be very precise and effective for accurate recognition. In the lifelog video retrieval, however, it will be difficult to prepare 3D facial features within reasonable cost because of a large amount of video data.

By using several salient facial feature points (e.g., the end points of a mouth) on a 2D coordinate system, the geometric features can be more concise. They are defined as the positional relationships of the facial feature points such as the distance between two points (Esau, Wetzel, Kleinjohann & Kleinjohann, 2007; Hupont, Cerezo & Baldassarri, 2010). We adopt the geometric features represented by the positional relationships of a few facial feature points because of the conciseness and the better understandability of the facial features.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 6: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing