Affectively Enhanced Subs: Visualization of Auditory Events With Color Scales and Animation

Affectively Enhanced Subs: Visualization of Auditory Events With Color Scales and Animation

Dimitrios G. Margounakis (Aristotle University of Thessaloniki, Greece), Andreas-Georgios Karamanos (Hellenic Open University, Greece) and Andreas Floros (Ionian University, Greece)
DOI: 10.4018/978-1-7998-0253-2.ch008

Abstract

This chapter presents the framework for a system that enhances the subtitles of a video file by detecting affective audio content. A model for the visualization of general and emotional information that is typically included in the audio material of a video file (TV/movies) is implemented. Depending on its nature, the content is visualized as an animated cartoon (sound assistant) and/or in the corresponding subtitle text channel, thereby supporting the monitoring of the above material in cases of hearing-impaired viewers, or in general cases that perception of sound during playback is problematic. An animated cartoon is proposed as a way of representing the emotional content of an audio event (speech) and the general content of an audio event (type of sound) in video, while color scales on the subtitles' fonts are used to represent basic emotions.
Chapter Preview
Top

Background

Malandrakis et. al. (2011) experimented on continuous time, continuous scale affective movie content recognition and investigated a variety of audio-visual features, working on twelve 30-minute movie clips. In the work of Kalyan & Kim (2009), Natural Language Processing (NLP) techniques on video subtitle dialogues were used to accomplish the task of detecting emotional scenes. These approaches were aimed at extracting emotions.

On the other hand (to represent emotions without automatically extracting it), Ohene-Djan et. al. (2007) presented an emotional subtitle editor and proposed color variation to represent characters and font variation to represent emotion. In the research of Fels et. al. (2005), an audience of deaf and hearing-impaired individuals were presented with three different versions of two video segments, where each of which was produced with three different caption styles: conventional captions, emotive captions that were located in one consistent location (lower center of screen), and emotive captions placed in different locations to show speaker identification. The emotive captions consisted of graphics, color and icons to represent the different emotions that were identified. Rashid et. al. (2006) created a framework relating animation properties (with animated text captions) to a set of eight basic emotions.

For recognition of emotion from face, Microsoft Azure has created an Emotion API1. Finally, in terms of general sound, YouTubeTM has recently announced the addition of sound effect information to the automatic caption track in YouTube videos, currently for applause, music and laughter2.

Key Terms in this Chapter

Classification: The process related to categorization, the process in which ideas and objects are recognized, differentiated, and understood. Most commonly used classifiers (algorithms) are K-nearest Neighbors (KNN), Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), Support Vector Machines (SVM), and Artificial Neural Networks (ANN).

Emotive Captioning: Closed captioning has allowed people who are deaf and hard of hearing to be included as audience members. However, some of the audio information, especially the affective content, is not generally provided for in captioning. Emotive captioning is a research area that investigates the ways to include some of this information in closed captions.

Affective Content: Multimedia content that stimulates audience’s emotions and reaction (e.g., the funniest or the most sentimental segments of a movie).

Audio Emotional Event: Audio or video segment with affective content.

Descriptive Subtitles: Subtitles of a movie that they also contain part of the script by means of auditory information (e.g., “shouting”, “phone ringing”).

Hearing Impaired Technologies (aka Assistive Technologies for the Deaf and Hard of Hearing): Products and technology used to help with hearing loss.

Complete Chapter List

Search this Book:
Reset