Article Preview
TopIntroduction
Within affective computing there are various classification challenges like for instance affective annotation of images, video, or written text; or the recognition of various affective states given sensor data. In this work we focus at the subdomain of emotion classification from sensor data, which can be based on many different input signals. The field is dominated by three main input types, all measuring signals from the human body: video, audio and physiology. Emotional states can be derived from video through facial expressions, postures or movements (Gunes and Piccardi, 2009; Sanchez et al., 2010; Xiao et al., 2011; Van Kuilenburg et al., 2008); from audio through utterances (Sobol-Shikler and Robinson, 2010; Van den Broek et al., 2011; Wu et al., 2011); and from physiology through a variety of bodily signals such as cardiac activity, skin conductance and respiration (Chanel et al., 2009; Hosseini et al., 2010; Van den Broek et al., 2010). We refer to Janssen et al. (2013a) for a more comprehensive overview of studies that use one or more of these input signals for emotion classification.
Despite various advances in the field, performances are generally below those in other fields of automated classification, such as finger print recognition, restricted cases of handwriting recognition, etc. This indicates that affective classification tasks are generally difficult. We also observe that most automated affective recognition systems use a one-step approach, directly mapping measured features to emotion labels. As suggested by some (Ptaszynski et al., 2009), the integration of contextual information in a multi-step approach might aid in the interpretation of various input signals. Because this approach might be applicable to a wide range of different affect recognition systems it calls for a structural approach.
We hypothesize that models coming from appraisal theory might provide such a reasoning framework in which factual contextual information can be combined with other sensor data that carry information about the personal interpretation of the person being measured. We propose a system that uses an appraisal model in a two-step approach, in which the first step maps measurement data to an appraisal representation, and a second step maps the appraisal representation onto emotion labels. Several appraisal theories have been coined (Ortony et al., 1988; Scherer, 2001; Frijda, 1987; Lazarus, 1991; Marsella et al., 2010) that have in common that they propose the process towards appraisal of stimuli to be person dependent whereas the generation of emotion from an appraisal is proposed to be person-independent. This means that interpersonal differences can be taken into account in the first step of the suggested two-step approach, while the second step is independent of the user of the system.
Regarding the mapping of sensor data to appraisal, several studies have shown that various appraisal dimensions can be obtained from, e.g., physiological measurements (Aue et al., 2007; Grandjean and Scherer, 2008; Bradley et al., 1993; Smith, 1989; van Reekum et al., 2004). The second step of mapping appraisals to emotion labels, has been studied to certain extent by the authors of appraisal models (Scherer, 1993; Scherer et al., 2006), but apart from one recent publication (Meuleman & Scherer, 2013) have not involved more sophisticated classification techniques from machine learning research. Our aim is to research the potential use of appraisal models in a two-step approach to emotion classification. In the present study we will provide an independent assessment of one such appraisal model using a variety of machine learning techniques and use visualization techniques to gain further insight into the classification task.