Reference Hub4
Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Imen Trabelsi, Med Salim Bouhlel
Copyright: © 2016 |Volume: 7 |Issue: 1 |Pages: 11
ISSN: 1947-9093|EISSN: 1947-9107|EISBN13: 9781466691407|DOI: 10.4018/IJSE.2016010105
Cite Article Cite Article

MLA

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." IJSE vol.7, no.1 2016: pp.58-68. http://doi.org/10.4018/IJSE.2016010105

APA

Trabelsi, I. & Bouhlel, M. S. (2016). Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition. International Journal of Synthetic Emotions (IJSE), 7(1), 58-68. http://doi.org/10.4018/IJSE.2016010105

Chicago

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition," International Journal of Synthetic Emotions (IJSE) 7, no.1: 58-68. http://doi.org/10.4018/IJSE.2016010105

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.