General Background
As computers have become an integral part of our lives, the need has arisen for a more natural communication interface between humans and machines. To accomplish this goal, a computer would have to be able to perceive its present situation and respond differently depending on that perception. Part of this process involves understanding a user’s emotional state. To make the human-computer interaction (HCI) more natural, it would be beneficial to give computers the ability to recognize situations the same way a human does.
A good reference model for emotion recognition is the human brain. Machine recognition of human emotion involves strong combination of informatics and cognitive science. The difficulty of this problem is rooted in the understanding of mechanisms of natural intelligence and cognitive processes of the brain, Cognitive Informatics (Wang, 2003). For effective recognition of human emotion, important information needs to be extracted from the captured emotional data to mimic the way human distinguish different emotions, while the processed information needs to be further classified by simulating that of human brain system for pattern recognition
In the field of HCI, speech is primary to the objectives of an emotion recognition system, as are facial expressions and gestures. It is considered as a powerful mode to communicate intentions and emotions. This chapter explores methods by which a computer can recognize human emotion in the speech signal. Such methods can contribute to human-computer communication and to applications such as learning environments, consumer relations, entertainment, and educational software (Picard, 1997).
A great deal of research has been done in the field of speech recognition, where the computer analyzes an acoustic signal and maps it into a set of lexical symbols. In this case, much of the emphasis is on the segmental aspect of speech, that is, looking at each individual sound segment of the input signal and comparing it with known patterns that correspond to different consonants, vowels and other lexical symbols. In emotion recognition, the lexical content of the utterance is insignificant because two sentences could have the same lexical meaning but different emotional information.
Emotions have been the object of intense interest in both Eastern and Western philosophy since before the time of Lao-Tzu (sixth century B.C.) in the east and of Socrates (470-399 B.C.) in the west, and most contemporary thinking about emotions in psychology can be linked to one Western philosophical tradition or another (Calhoun & Solomon, 1984). However, the beginning of modern, scientific inquiry into the nature of emotion is thought by many to have begun with Charles Darwin’s study of emotional expression in animals and humans (Darwin, 1965). A survey of contemporary research on emotion in psychology reveals four general perspectives about defining, studying, and explaining emotion (Cornelius, 1996). These are the Darwinian, the Jamesian, the cognitive, and the social constructivist perspectives. Each of these perspectives represents a different way of thinking about emotions. Each has its own set of assumptions about how to define, construct theories about, and conduct research on emotion, and each has associated with its own tradition of research (Ekman & Sullivan, 1987; Levenson, Ekman, & Friesen 1990; Smith & Kleinman, 1989; Smith & Lazarus, 1993).
A wide investigation on the dimensions of emotions reveals that at least six emotions are universal. Several other emotions, and many combinations of emotions, have been studied but remain unconfirmed as universally distinguishable. A set of six principal emotions is happiness, sadness, anger, fear, surprise, and disgust, which is the focus of study in this chapter.