Speech Emotion Recognition Based on Gender Influence in Emotional Expression

Speech Emotion Recognition Based on Gender Influence in Emotional Expression

P Vasuki (SSN College of Engineering, Chennai, India) and Divya Bharati R (State Bank of India, Mumbai, India)
Copyright: © 2019 |Pages: 19
DOI: 10.4018/IJIIT.2019100102

Abstract

The real challenge in human-computer interaction is understanding human emotions by machines and responding to it accordingly. Emotion varies by gender and age of the speaker, location, and cause. This article focuses on the improvement of emotion recognition (ER) from speech using gender-biased influences in emotional expression. The problem is addressed by testing emotional speech with an appropriate specific-gender ER system. As acoustical characteristics vary among the genders, there may not be a common optimal feature set across both genders. Gender-based speech emotion recognition, a two-level hierarchical ER system is proposed, where the first level is gender identification which identifies the gender, and the second level is a gender-specific ER system, trained with an optimal feature set of expressions of a particular gender. The proposed system increases the accuracy of traditional Speech Emotion Recognition Systems (SER) by 10.36% than the SER trained with mixed gender training when tested on the EMO-DB Corpus.
Article Preview
Top

I. Introduction

The speech signal is the fastest and the most natural way of communication between humans. For an efficient method of interaction between human and machine, the machine must have sufficient intelligence to understand the information content exists in speech (Ayadi et al., 2011). This understanding capacity of the machine can further be improved if an emotional state of the speaker is also known (Nwe et al., 2003).

The semantics of spoken words varies based on emotional context. Thus, identification of emotional state is more significant in speech technology. Speech Emotion Recognition (SER) is especially useful for applications which require natural man-machine interaction such as computer tutoring applications, interactive speech bot like Alexa. The response of those systems will be productive if it understands the emotions of the user. SER is also useful in automatic translation systems in which the emotional state of the speaker plays a vital role in communication to identify the exact meaning of the phrase. SER has also been helpful in call centre applications and mobile communication (Petrushin, 1999), where the client's satisfaction is credited into employees’ appraisal.

The research focus on Speech Emotion recognition (SER) has been evolving from the late nineties (Dellaert et al., 1995) and SER finds much social relevant application. Systems based on SER serve to human according to one's emotional state of behavior on various occasions. SER is used to assist human to play music according to mood. From a psychological point of view, SER can be used to monitor one's homeostatic balance and to give feedback accordingly (LeeBusso, 2013). The performance of dialog-based applications and question answering systems may be improved by incorporating emotions in conversation (Burkhardt et al., 2009). In usability analysis, the interactive application can capture the feelings of the speaker towards the product and user-friendliness of the application. Interactive gaming applications are developed to record, analyze emotions which are created during game playing. This helps in research and analysis of the role of certain games in the simulation of emotions of the user. It can be used by psychologists for various analysis.

SER has been affected by various factors like recording environment, acoustical and cultural background, age & gender of the speaker and many other factors. According to literature, culture, and gender roles have a stronger impact on emotional expression (Wester et al., 2002), (Wang, 2018) (Kamaruddin et al., 2012). Researchers use gender information to enhance emotion recognition accuracy (Devika et al., 2016, Fu and Wang, 2010) and emotion information for gender recognition (Chen, Gu, Lu, & Ke, 2012; Safavi et al., 2018). One such possible explanation for gender differences in emotional expressiveness is social factors. Men and women have been taught by social and cultural standards to express emotions differently (Derks et al., 2008). In many places, empirical evidence suggests that girls are socialized to be moving, non-aggressive, nurturing, and obedient, whereas boys are socialized to be unemotional, aggressive, achievement-oriented, and self-reliant. In many countries, women are often presumed to express happiness while men aren't expected to be expressive (Wester et al., 2002).

Moreover, the physical characteristics of the sound generation system of the male and female system vary in vocal track length, and vocal cord size and dimension change the glottal closure period, formant frequency of men and women. The acoustical characteristics of emotional speech of male and female differ due to the variations in the range of the acoustical features: pitch, intensity, energy and formant frequencies. Thus, having a common emotion model for emotions of both genders may not provide accurate results as the training involve variations of parameters due to gender also.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 16: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing