Emotion Recognition From Speech Using Perceptual Filter and Neural Network

Emotion Recognition From Speech Using Perceptual Filter and Neural Network

Revathi A. (SASTRA University, India) and Sasikaladevi N. (SASTRA University, India)
Copyright: © 2020 |Pages: 14
DOI: 10.4018/978-1-7998-1159-6.ch004

Abstract

This chapter on multi speaker independent emotion recognition encompasses the use of perceptual features with filters spaced in Equivalent rectangular bandwidth (ERB) and BARK scale and vector quantization (VQ) classifier for classifying groups and artificial neural network with back propagation algorithm for emotion classification in a group. Performance can be improved by using the large amount of data in a pertinent emotion to adequately train the system. With the limited set of data, this proposed system has provided consistently better accuracy for the perceptual feature with critical band analysis done in ERB scale.
Chapter Preview
Top

Introduction

Speech signal is considered as the acoustic signal obtained by exciting the vocal tract by quasi periodic pulses of air for voiced sounds and noise like excitation for unvoiced sounds. Speech utterances reveal the linguistic content, accent, slang and emotional state of a speaker. It is really cumbersome to recognize the emotions from speech with limited set of data. Emotion recognition from speech has found applications in call centers and unmanned control of risky processes. This system would be useful for treating the mentally retarded patients and patients with depression and anxiety. Web related services, retrieval of information and synthesis of data would use this automated emotion recognition system. These systems will find place in operating robots for the speech commands given by the emotional operator. Modulation spectral feature is used as a new feature by Siging Wu et.al (Wu, 2011) for emotion recognition. Chi-Chun Lee et.al (Lee, 2011) have used hierarchical binary classifier and acoustic & statistical feature for emotion recognition. K. Sreenivasa Rao et.al (Rao, 2012) have used MFCC and GMM for recognizing emotions. Ankur Sapra et.al (Sapra, 2013) has used modified MFCC feature and NN classifier for emotion recognition. Shashidar G. Koolakudi et.al (Koolagudi, 2012) have used MFCC and GMM for speaker recognition in emotional environment. In this chapter on speaker independent emotion recognition, SVM is used to create templates for all emotions and system is evaluated with the speeches of a speaker not considered for training. Training speeches are converted into set of features and SVM models are developed as representative of emotions. During testing, group classification is done using minimum distance classifier and subsequently individual emotion classification is done in a group containing pertinent emotion models using linear binary classifier. Perceptual linear predictive cepstrum with critical band analysis done in BARK and ERB scale are used as features in this work and they provide complimentary evidence in assessing the performance of the system based on ANN modeling technique. ANN modeling technique is based on the selection of hidden layers and number of neurons in hidden layer. Weights between the layers are optimized using iterative procedure and output layer with two neurons to choose one of the two emotions in a group. This chapter also deals with the comparative analysis between the features and analysis is done comprehensively to assess the performance of the speaker independent and dependent emotion recognition system.

Affective computing has played a pivotal role in acting as an interface between humans and machines. Speech based emotion recognition system is difficult to be implemented because of the dataset which is containing limited set of speech utterances spoken limited set of speakers. Emotion recognition from speech is performed by using various databases. This chapter on multi speaker independent emotion recognition encompasses the use of perceptual features with filters spaced in Equivalent rectangular bandwidth (ERB) and BARK scale and vector quantization (VQ) classifier for classifying groups and artificial neural network with back propagation algorithm for emotion classification in a group. Performance can be improved by using the large amount of data in a pertinent emotion to adequately train the system. With the limited set of data, this proposed system has provided consistently better accuracy for the perceptual feature with critical band analysis done in ERB scale with overall accuracy as 76% and decision level fusion classification yielded 100% as accuracy for all emotions except FEAR and BOREDOM. Overall accuracy of the decision level fusion classifier is 78%. Speaker dependent emotion recognition system has provided 100% as accuracy for all the emotions for perceptual feature with critical band analysis done in ERB scale and perceptual linear predictive cepstrum has given 100% as accuracy for all emotions except anger and fear emotions.

Complete Chapter List

Search this Book:
Reset