Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Imen Trabelsi, Med Salim Bouhlel
ISBN13: 9781799824602|ISBN10: 1799824608|EISBN13: 9781799824619
DOI: 10.4018/978-1-7998-2460-2.ch015
Cite Chapter Cite Chapter

MLA

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, IGI Global, 2020, pp. 283-293. https://doi.org/10.4018/978-1-7998-2460-2.ch015

APA

Trabelsi, I. & Bouhlel, M. S. (2020). Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition. In I. Management Association (Ed.), Cognitive Analytics: Concepts, Methodologies, Tools, and Applications (pp. 283-293). IGI Global. https://doi.org/10.4018/978-1-7998-2460-2.ch015

Chicago

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications, edited by Information Resources Management Association, 283-293. Hershey, PA: IGI Global, 2020. https://doi.org/10.4018/978-1-7998-2460-2.ch015

Export Reference

Mendeley
Favorite

Abstract

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.