Neural Networks for Language Independent Emotion Recognition in Speech

Neural Networks for Language Independent Emotion Recognition in Speech

Yongjin Wang (University of Toronto, Canada), Muhammad Waqas Bhatti (University of Sydney, Australia) and Ling Guan (Ryerson University, Canada)
DOI: 10.4018/978-1-60566-902-1.ch025
OnDemand PDF Download:
No Current Special Offers


This chapter introduces a neural network based approach for the identification of human affective state in speech signals. A group of potential features are first identified and extracted to represent the characteristics of different emotions. To reduce the dimensionality of the feature space, whilst increasing the discriminatory power of the features, a systematic feature selection approach which involves the application of sequential forward selection (SFS) with a general regression neural network (GRNN) in conjunction with a consistency-based selection method is presented. The selected parameters are employed as inputs to the a modular neural network, consisting of sub-networks, where each sub-network specializes in a particular emotion class. Comparing with the standard neural network, this modular architecture allows decomposition of a complex classification problem into small subtasks such that the network may be tuned based on the characteristics of individual emotion. The performance of the proposed system is evaluated for various subjects, speaking different languages. The results show that the system produces quite satisfactory emotion detection performance, yet demonstrates a significant increase in versatility through its propensity for language independence.
Chapter Preview


General Background

As computers have become an integral part of our lives, the need has arisen for a more natural communication interface between humans and machines. To accomplish this goal, a computer would have to be able to perceive its present situation and respond differently depending on that perception. Part of this process involves understanding a user’s emotional state. To make the human-computer interaction (HCI) more natural, it would be beneficial to give computers the ability to recognize situations the same way a human does.

A good reference model for emotion recognition is the human brain. Machine recognition of human emotion involves strong combination of informatics and cognitive science. The difficulty of this problem is rooted in the understanding of mechanisms of natural intelligence and cognitive processes of the brain, Cognitive Informatics (Wang, 2003). For effective recognition of human emotion, important information needs to be extracted from the captured emotional data to mimic the way human distinguish different emotions, while the processed information needs to be further classified by simulating that of human brain system for pattern recognition

In the field of HCI, speech is primary to the objectives of an emotion recognition system, as are facial expressions and gestures. It is considered as a powerful mode to communicate intentions and emotions. This chapter explores methods by which a computer can recognize human emotion in the speech signal. Such methods can contribute to human-computer communication and to applications such as learning environments, consumer relations, entertainment, and educational software (Picard, 1997).

A great deal of research has been done in the field of speech recognition, where the computer analyzes an acoustic signal and maps it into a set of lexical symbols. In this case, much of the emphasis is on the segmental aspect of speech, that is, looking at each individual sound segment of the input signal and comparing it with known patterns that correspond to different consonants, vowels and other lexical symbols. In emotion recognition, the lexical content of the utterance is insignificant because two sentences could have the same lexical meaning but different emotional information.

Emotions have been the object of intense interest in both Eastern and Western philosophy since before the time of Lao-Tzu (sixth century B.C.) in the east and of Socrates (470-399 B.C.) in the west, and most contemporary thinking about emotions in psychology can be linked to one Western philosophical tradition or another (Calhoun & Solomon, 1984). However, the beginning of modern, scientific inquiry into the nature of emotion is thought by many to have begun with Charles Darwin’s study of emotional expression in animals and humans (Darwin, 1965). A survey of contemporary research on emotion in psychology reveals four general perspectives about defining, studying, and explaining emotion (Cornelius, 1996). These are the Darwinian, the Jamesian, the cognitive, and the social constructivist perspectives. Each of these perspectives represents a different way of thinking about emotions. Each has its own set of assumptions about how to define, construct theories about, and conduct research on emotion, and each has associated with its own tradition of research (Ekman & Sullivan, 1987; Levenson, Ekman, & Friesen 1990; Smith & Kleinman, 1989; Smith & Lazarus, 1993).

A wide investigation on the dimensions of emotions reveals that at least six emotions are universal. Several other emotions, and many combinations of emotions, have been studied but remain unconfirmed as universally distinguishable. A set of six principal emotions is happiness, sadness, anger, fear, surprise, and disgust, which is the focus of study in this chapter.

Complete Chapter List

Search this Book: