Opportunistic Detection Methods for Emotion-Aware Smartphone Applications

Opportunistic Detection Methods for Emotion-Aware Smartphone Applications

Igor Bisio, Alessandro Delfino, Fabio Lavagetto, Mario Marchese
DOI: 10.4018/978-1-5225-0159-6.ch028
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Human-machine interaction is performed by devices such as the keyboard, the touch-screen, or speech-to-text applications. For example, a speech-to-text application is software that allows the device to translate the spoken words into text. These tools translate explicit messages but ignore implicit messages, such as the emotional status of the speaker, filtering out a portion of information available in the interaction process. This chapter focuses on emotion detection. An emotion-aware device can also interact more personally with its owner and react appropriately according to the user's mood, making the user-machine interaction less stressful. The chapter gives the guidelines for building emotion-aware smartphone applications in an opportunistic way (i.e., without the user's collaboration). In general, smartphone applications might be employed in different contexts; therefore, the to-be-detected emotions might be different.
Chapter Preview
Top

Introduction

In recent years the computational capabilities of mobile devices, such as smartphones, has exponentially increased, giving the possibility of a more personal human-machine interaction. Moreover, smart portable devices, such as smartphones, can collect a lot of data from the surrounding environment opportunistically, i.e. without needing any collaborative behavior by the user, and exploit the so obtained information to adapt its behavior to the context, enabling the development of the so-called Context-Aware applications. The fastest and most personal method of interaction is the speech, furthermore, the speech is a signal that the smartphone can exploit opportunistically. Through the speech the mobile device can identify the speaker, its gender and, if there is more than one speaker, the number of the speakers (Agneessens, Bisio, Lavagetto, Marchese, & Sciarrone, 2010). Currently human-machine interaction is performed by devices such as the keyboard, mouse, the touch-screen and, especially in new generation smartphones, by speech to text applications, software which allow the device to translate the spoken words into text. These kinds of tools translate explicit messages but ignore implicit messages, such as the emotional status of the user, filtering out a portion of information available in the interaction process. Automatic emotion detection can be used in a wide range of applications. In teleconferences adding an explicit reference of the emotional state of the speaker can add useful information that can be lost due to the reduced naturalism of the medium. An emotion aware device can also interact more personally with its owner and react appropriately according to the user mood, making the user-machine interaction less stressful. For example, it has been proven (Burkhardt, 2005) that often users get frustrated by talking to machines and hang up without having the possibility to express their anger in some way, an emotion aware system can recognize user's mood and handle this problem. The smart device can also automatically adapt a playlist in order to play the better suited song for the particular mood detected in the user (Sandor Dornbush, 2005). In life-simulation videogames, where the user have to control one or more virtual lifeforms the experience can be enhanced with an automatic emotion sensitive system capable to detect the emotional state of the player.

During the next years, several experts predict a significant growth in the market for converged mobile devices that simultaneously include voice-phone function with multimedia, PDA and game applications. These devices will allow expanding the current market by adding new types of consumers. In facts, they will employ these devices for activities very different with respect to classical mobile phone calls. This new trend will drive both Original Equipment Manufacturers (OEMs) and Carriers to meet this growth by providing smart devices and new services for the new class of users.

In more detail, in 2003 converged mobile devices, also termed smartphones, were forecast to make up three percent of worldwide mobile phone sales volume. Nowadays, the smartphone market is continuing to expand at triple digit year-over-year growth rates, due to the evolution of voice-centric converged mobile devices, mobile phones with applications processors and advanced operating systems supporting a new range of data functions, including application download and execution.

In practice, smartphones will play a crucial role to support the users’ activities both from the professional and private viewpoint.

Operatively, the emotion recognition is a statistical pattern classification problem; a general speech emotion recognition procedure is shown by the flowchart in Figure 1. The speech is recorded by the microphone of the smartphone and the raw audio data is passed to the Feature Extraction block. Feature extraction consists in simplifying the amount of resources required to describe a large set of data accurately. In the classification stage machine learning methods are applied on the selected speech feature to recognize the emotional states in the speech.

Figure 1.

Scheme of a general emotion recognition procedure

978-1-5225-0159-6.ch028.f01

Complete Chapter List

Search this Book:
Reset