Automatic emotion recognition is becoming a focus in interactive technological systems. Indeed, there is a rising need for emotional state recognition in several domains, such as psychiatric diagnosis, video games, human-computer interaction or even detection of lies (Cowie et al., 2001). The lack of a standard in human emotions modeling hinders the sharing of affective information between applications. Current works on modeling and annotation of emotional states (e.g., Emotion Markup Language (EmotionML) (Schroder et al., 2011), Emotion Annotation and Representation Language (EARL) (The HUMAINE Association, 2006)) aim to provide a standard for emotion exchange between applications, but they use natural languages to define emotions. They use words instead of concepts. For example, in EARL, joy would be represented by the following string '<emotion category='joy' >', which is the English word for the concept of joy and not the concept itself, which could be expressed in all languages (e.g., joie, farah, gioia). Our goal is to provide a multi-modal system for emotion recognition and exchange that will facilitate inter-systems exchanges and improve the credibility of emotional interaction between users and computers. We proposed in previous work a Three-Layer Model (Tayari Meftah, Thanh, & Ben Amar, 2011) for emotion exchange composed of three distinct layers: the psychological layer, the formal computational layer and the language layer (cf. Figure 1). In this study we enrich this model by adding a multimodal recognition module that is capable of estimating the human emotional state through analyzing and fusing a number of cues provided through humans’ speech, facial expressions, physiological changes, etc.
Real life emotions are often complex and people naturally communicate multimodally by combining language, tone, facial expression and gesture (Friesen & Ekman, 2009; Scherer, 1998). Indeed, the complexity of emotions makes the acquisition very difficult and makes unimodal systems (i.e., the observation of only one source of emotion) unreliable and often unfeasible in applications of high complexity. For example, a person can attempt to regulate the expression of her face to hide the true felt emotion. If we analyze only her facial expression we can found joy but he really felt angry and he tries to hide his true emotion. Thus, to improve the emotion recognition system needs to process, extract and analyze a variety of cues provided through humans’ speech, facial expressions, physiological changes, etc.
In this paper, we present a new multimodal approach for emotion recognition that integrates information from different modalities in order to allow more reliable estimation of emotional states. Our contributions concern two points. First, we propose a generic computational model for the representation of emotional states and for a better multi-modal analysis. The second point of our contribution is focused on a multi-modal biometric Emotion Recognition method. This method combines four modalities: Electromyography (EMG), Electro Dermal Activity (GSR), blood volume pulse (BVP) and respiration (cf Figure 2). For each modality we apply a new monomadal emotion recognition based on signal processing techniques.
The global scheme of the proposed method
The remainder of this paper is organized as follows. In Section 2, we give some related psychological and linguistic theories of emotion. Then, we present some of relevant work in the field of automatic emotion recognition. In Section 3, we perform emotion recognition in two stages: unnimodal and multimodal emotion recognition. firstly we describe an unimodal method of emotion recognition based on signal processing algorithm. Secondly, we describes the details of the proposed multimodal approach. Section 4 provides the experimental results. Finally, we conclude in Section 5.