Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Imen Trabelsi, Med Salim Bouhlel

Source Title: Cognitive Analytics: Concepts, Methodologies, Tools, and Applications

ISBN13: 9781799824602|ISBN10: 1799824608|EISBN13: 9781799824619

DOI: 10.4018/978-1-7998-2460-2.ch015

MLA

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, IGI Global, 2020, pp. 283-293. https://doi.org/10.4018/978-1-7998-2460-2.ch015

APA

Trabelsi, I. & Bouhlel, M. S. (2020). Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition. In I. Management Association (Ed.), Cognitive Analytics: Concepts, Methodologies, Tools, and Applications (pp. 283-293). IGI Global. https://doi.org/10.4018/978-1-7998-2460-2.ch015

Chicago

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, 283-293. Hershey, PA: IGI Global, 2020. https://doi.org/10.4018/978-1-7998-2460-2.ch015

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

MLA

APA

Chicago

Export Reference

Abstract

Request Access