Rank-Pooling-Based Features on Localized Regions for Automatic Micro-Expression Recognition

Rank-Pooling-Based Features on Localized Regions for Automatic Micro-Expression Recognition

Trang Thanh Quynh Le, Thuong-Khanh Tran, Manjeet Rege
DOI: 10.4018/IJMDEM.2020100102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Facial micro-expression is a subtle and involuntary facial expression that exhibits short duration and low intensity where hidden feelings can be disclosed. The field of micro-expression analysis has been receiving substantial awareness due to its potential values in a wide variety of practical applications. A number of studies have proposed sophisticated hand-crafted feature representations in order to leverage the task of automatic micro-expression recognition. This paper employs a dynamic image computation method for feature extraction so that features can be learned on certain localized facial regions along with deep convolutional networks to identify micro-expressions presented in the extracted dynamic images. The proposed framework is simple as opposed to other existing frameworks which used complex hand-crafted feature descriptors. For performance evaluation, the framework is tested on three publicly available databases, as well as on the integrated database in which individual databases are merged into a data pool. Impressive results from the series of experimental work show that the technique is promising in recognizing micro-expressions.
Article Preview
Top

1. Introduction

Facial micro-expression (ME) is a momentary facial movement that delicately conveys emotions. If recognizing normal macro-expressions is relatively effortless (Wang, Peng, Bi et al, 2020) as they are apparent and noticeably obvious, it is seemingly tricky to identify different MEs within certain contexts (Zhao & Xu, 2019). The first ME occurrence was acknowledged in 1966, when discovered by Haggard and Isaacs in a filmed interview (Haggard & Isaacs, 1966). It shortly became well known and well formed in psychology. ME occurs in the first place when a person either unconsciously or intentionally hides their genuine feelings as to obtain some personal goals or avoid dangers (Ekman, 2009),(Li et al., 2017). Contrary to macro-expressions, MEs arise over a transient moment with meager facial muscular changes (Ekman, 2009),(Li et al., 2018). Hence, discerning spontaneous ME in any particular situation is a difficult and complicated activity.

ME analysis task is fundamentally partitioned into spotting and recognition. After an ME being spotted from an input video sequence, it is detected (Li et al., 2017) and categorized into several predefined emotion labels (Oh et al., 2018). Regardless of the striking attributes and the complexity of analyzing ME, researchers have been extensively studying ME using computational methods, as it is promising and applicable in many disciplines such as security systems, clinical diagnosis, forensic investigation, etc. (Ekman, 2009). In order to aid humans in distinguishing different MEs, various tools have been invented of which performances were not sufficient. The best training tool was METT with 40% of MEs being correctly recognized (Seidenstat & Splane, 2009). As a result, studying ME has been leveled up to employ cutting-edge techniques in computer vision to automate the task and achieve better outcomes.

Deep learning has emerged and rapidly become a rising approach for solving challenging problems. With high-level feature representation, deep neural networks have leveraged a lot of automatic tasks including micro-expression recognition. (Patel et al., 2016) was the first paper that made an attempt to apply deep learning in learning rich features. They utilized transfer learning by taking pre-trained Image-net models, and applied it to feature extraction. After that, a feature selection step was implemented to select only relevant information, before feeding them all into a convolutional network for classification. The proposed framework achieved a 56.3% accuracy rate. Aside from feature learning, many studies have engaged deep learning with ME recognition as a classifier. In (Li et al., 2018), a VGG-Face model was run on features exclusively learned from single apex frames out of input sequences.They assumed that apex frames contained the most important facial information, and by neglecting other unnecessary frames they could avoid supplemental noises leading to a better outcome. This method obtained a better detection rate, 63.3%. Another work adopted different CNN architectures to extract low-level features based on spatio-temporal domain taking into consideration their expression states. Spatial features were encoded along with their expression states by a CNN architecture and afterward used to learn its corresponding temporal texture. A LSTM network was subsequently employed to classify emotions and gained an outcome of 60.98% accuracy. Other research have also significantly endeavored to tackle and improve the task of automatic micro-expression recognition by using different approaches and fine-tuned networks. Nevertheless, the results were not sufficiently satisfactory (Oh et al., 2018). The undeniably tough characteristics of ME, low intensity and short duration (Li et al., 2018), in addition to the lack of ME data (Oh et al., 2018) account for the impairment in performance of previous methods.

In this paper, we introduce a straightforward, simple and compact yet very effective framework for automatic facial ME recognition, as illustrated in the Figure 1. By applying the rank-pooling based dynamic image to extract the gist of video sequences, we simplify the task of identifying ME. With the amount of steps involved reduced, we turn it into a single RGB image classification problem. After feature extraction, elicited images are then further learned by descriptor in the form of convolutional layers in posterior deep convolutional networks. They classify MEs at the end as a classifier. The implementation is simple compared to other intricate methods with low-level feature extraction techniques.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing