Automated Recognition of Emotion Appraisals

Automated Recognition of Emotion Appraisals

Marcello Mortillaro (University of Geneva (Swiss Center for Affective Sciences), Switzerland), Ben Meuleman (University of Geneva (Swiss Center for Affective Sciences), Switzerland) and Klaus R. Scherer (University of Geneva (Swiss Center for Affective Sciences), Switzerland)
DOI: 10.4018/978-1-4666-7278-9.ch016
OnDemand PDF Download:
List Price: $37.50


Most computer models for the automatic recognition of emotion from nonverbal signals (e.g., facial or vocal expression) have adopted a discrete emotion perspective, i.e., they output a categorical emotion from a limited pool of candidate labels. The discrete perspective suffers from practical and theoretical drawbacks that limit the generalizability of such systems. The authors of this chapter propose instead to adopt an appraisal perspective in modeling emotion recognition, i.e., to infer the subjective cognitive evaluations that underlie both the nonverbal cues and the overall emotion states. In a first step, expressive features would be used to infer appraisals; in a second step, the inferred appraisals would be used to predict an emotion label. The first step is practically unexplored in emotion literature. Such a system would allow to (a) link models of emotion recognition and production, (b) add contextual information to the inference algorithm, and (c) allow detection of subtle emotion states.
Chapter Preview


Hundreds of studies investigated the emotional meaning of nonverbal signals and most of them implicitly or explicitly used a discrete emotion perspective (Scherer, Clark-Polner, Mortillaro, 2011), i.e., that each emotion is a qualitatively different entity from the others. Discrete emotion theory has been formulated on the basis of findings concerning few intense emotions—called basic emotions—that are expected to have prototypical facial expressions and physiological signatures (Ekman, 1992, 1999; Ekman, Levenson, & Friesen, 1983; Ekman, Sorenson, & Friesen, 1969). This theory has dominated the field for decades and is still widely used.

Most attempts at computer recognition of emotion from nonverbal expressions (one of the goals of the research on affective computing, Picard, 1997) have adopted the same discrete perspective. Typically, such models attempt to automatically detect almost invariant configurations that are supposed to be prototypical to certain emotion categories, for example by matching a facial expression to a set of stored templates.

The results of a recent challenge for emotion recognition systems showed that the automatic classification of facial expressions of emotion in discrete categories is technically feasible—assuming the availability of an appropriate ground truth (Valstar et al., 2012). One problem is that the discrete approach (template matching) transfers poorly to real-world expressions and applications. Studies showed that, in everyday communications, prototypical expressions do not occur very often and that the interpretation of nonverbal cues is heavily influenced by context (Aviezer, Trope, & Todorov, 2012; Carroll & Russell, 1996). This recent evidence in psychology calls for a similar paradigm shift in the field of automatic detection of emotion. Ideally, this shift should involve both the detection and the inference part of recognition systems.

Indeed, emotion recognition systems can be conceived as made of two parts, a detection component and an inference component. The detection component performs the analysis of the facial movements; the inference component outputs the attribution of an emotional meaning to the movements detected by the first component. More recent automated models of detection have abandoned the template-matching approach and are now focused on the automatic detection of action units (Valstar, Mehu, Pantic, & Scherer, 2012). The Facial Action Coding System (FACS, Ekman & Friesen, 1978) is the recognized standard for the coding of facial movements, and researchers are now trying to implement an automated version capable of detecting each movement shown by a face. Current results are promising and we can expect that, in the near future, these systems will become fully reliable and perform in a satisfactory way. As the detection problem is getting solved, attention should now focus on what is the best model to attribute an emotional meaning1. We propose that the inference component should be modeled after emotion approaches that offer greater flexibility to different contexts and allow the inferences of partial emotion information, i.e., appraisals.

Key Terms in this Chapter

Emotion Elicitation: To trigger an emotion.

Appraisal: The cognitive evaluation of an event/object, automatic or controlled.

Facial Action Unit: The most basic independent visible movement of the facial musculature.

Non-Verbal Behavior: All visible signs in the face and the body, and audible in the voice (excluding verbal content) of someone.

Facial Expression: The combination of all the movements (contraction and extension) of the muscles of the face at a definite moment.

Emotion: A brief episode of synchronized changes in several organismic subsystems, which happens in consequence of an evaluation of an object/event as relevant, and causes the organism to react.

Complete Chapter List

Search this Book: