Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Audio-Visual Emotion Recognition System Using Multi-Modal Features

Anand Handa, Rashi Agarwal, Narendra Kohli

Source Title: International Journal of Cognitive Informatics and Natural Intelligence (IJCINI) 15(4)

DOI: 10.4018/IJCINI.20211001.oa34

Article PDF Download Open access articles are freely available for download

Abstract

Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.

Article Preview

Top

Introduction

Computer vision, in recent years, has witnessed outstanding and productive outcomes because of the tasks like face recognition, emotion recognition, and speech recognition. The reason is the adaptation of high-end techniques like machine learning. However, human expression recognition is still an onerous task. The first Emotion Recognition in Wild (EmotiW) (Dhall et al., 2013) challenge was held in the year 2013. Since then, the classification accuracy has increased to a great extent from a baseline figure of 38% but still, there is a scope of improvement. There are several reasons in the past for low accuracy percentage such as there is a lack of labeled video datasets, the nature of facial expressions is ambiguous, and the effectiveness of the methods of extracting facial expression is less. In the last few years, techniques like Deep Convolutional Neural Network (DCNN) (Schmidhuber, 2015) is proven to be outstanding in extracting features from an image. Also, Long Short Term Memory (LSTM) is proven to be the best in analyzing sequential data (Sak et al., 2014). Thus, by applying all these recent and effective methods and combining them may increase the accuracy of classifying the human facial expressions more effectively. The main contributions of this paper can be summarized as follows:

• A separate feature selection model is introduced in AlexNet architecture which automatically filters the most prominent facial features. It helps in an overall improvement of the accuracy of the model.
• Separate models for audio and visual emotion recognition with better classification accuracy.
• A probabilistic audio-visual fusion model using SVM machine learning classifier which classifies the emotions with a better accuracy.

The rest of the paper is organized as follows: Section 2 discusses the related work. In section 3, the authors present the multi-modal emotion recognition framework, including the discussion of datasets, multi- modal features, and network architecture. In section 4, the authors present the experimental setup for the audio and visual emotion recognition. In section 5, the experimental results from the audio, video, and audio-visual fusion-based recognition models are discussed separately, and Section 6 concludes the paper.

Top

A multi-modal approach for an emotion recognition system is more powerful and efficient than the bimodal and unimodal approaches because human emotions depend on both audio and visual information. In recent years, many studies came up, which are based on audio-visual recognition of human emotions and they also prove audio and visual fusion for emotion recognition to be advantageous. In this section, the authors discuss a few of them.

M. Mansoorizadeh et al. (Mansoorizadeh and Charkari, 2010) propose a fusion-based approach to emotion recognition. It uses both decision and feature level fusion. Features which are related to the same emotion has a higher chance of getting overlapped. The proposed framework combines features of the different modalities and generates a hybrid feature space. The experiments are performed on two different audio-visual emotion databases with a total number of 42 and 12 subjects. The proposed model accuracy is comparatively higher than the unimodal and bimodal face and speech-based individual systems.

An audio-visual recognition system based on the fusion of features is proposed by R. Gajsek et al. (Štruc et al., 2010). For the audio-based recognition model, the coefficients -- cepstral and prosodic are extracted, and for video-based recognition model, Gabor wavelets are considered as features. Lastly, to combine the outputs, a multi-class classifier is used.

International Journal of Cognitive Informatics and Natural Intelligence

In (Avots et al., 2019), authors present the analysis of an audio-visual model for emotion recognition. They use three different databases SAVEE, eNTERFACE’05, and RML for training the models and AFEW database is used as a testing set. MFCC coefficients are used to represent the emotional speech and SVM machine learning classifier is used for classification. The proposed multimodal emotion recognition is a decision-based fusion model. They perform the facial image classification using AlexNet. The reported accuracy for eNTERFACE’05 is 48.2%.

Complete Article List

Search this Journal:

Reset

Volume 18: 1 Issue (2024)

Volume 17: 1 Issue (2023)

Volume 16: 1 Issue (2022)

Volume 15: 4 Issues (2021)

Volume 14: 4 Issues (2020)

Volume 13: 4 Issues (2019)

Volume 12: 4 Issues (2018)

Volume 11: 4 Issues (2017)

Volume 10: 4 Issues (2016)

Volume 9: 4 Issues (2015)

Volume 8: 4 Issues (2014)

Volume 7: 4 Issues (2013)

Volume 6: 4 Issues (2012)

Volume 5: 4 Issues (2011)

Volume 4: 4 Issues (2010)

Volume 3: 4 Issues (2009)

Volume 2: 4 Issues (2008)

Volume 1: 4 Issues (2007)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Audio-Visual Emotion Recognition System Using Multi-Modal Features

Abstract

Introduction

Complete Article List

Audio-Visual Emotion Recognition System Using Multi-Modal Features

Abstract

Introduction

Related Work

Complete Article List