Speech Emotion Analysis of Different Age Groups Using Clustering Techniques

Speech Emotion Analysis of Different Age Groups Using Clustering Techniques

Hemanta Kumar Palo (Siksha 'O' Anusandhan University, Bhubaneswar, India), Mihir Narayan Mohanty (Department of Electronics and Communication Engineering, Siksha ‘O' Anusandhan University, Bhubaneswar, India) and Mahesh Chandra (Birla Institute Technology, Ranchi, India)
Copyright: © 2018 |Pages: 17
DOI: 10.4018/IJIRR.2018010105


The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.
Article Preview

1. Introduction

Facial expressions, gestures or verbal communication are few modalities for recognition of gender and age-related individual patterns. Among these modalities, speech remains a sole medium that makes the detection task difficult particularly in telephonic conversation. The objective is to determine the speaker’s emotional conversation pattern based on his/her age. The determination of these will be beneficial to law enforcement agencies in studying criminal psychology and further investigation. Particularly, the speaker’s state of mind and emotional attributes will assist the condition of both victim and the culprit during court hearings and prevent confusion. Identification of intimidating calls, false alarms, kidnapping involving influential people, fanatic religious groups, radicals, etc. can be made possible with such systems (Hämäläinen et al., 2011). Further, the recognition system will help in implementing corrective measures in case negative emotional attributes are manifested among children before it is too late. Utterances of speaker colored with emotion and age detection can also help human-robotic interfaces, telecommunications, intelligent tutoring, smart call center application, etc.

The vocal tract and vocal fold of human speech production mechanism are in a growing stage till a child attains adolescent. Selecting suitable features representing the age of the speaker thus remains an ever-growing challenge. Recognition systems trained for adult speakers often proved inefficient when these are trained with child’s utterances (Tanner & Tanner, 2004). This is because the core features representing the speech and emotional contents of an utterance vary with age and gender of the speaker. Especially, the fundamental frequency (F0), formants, speech rate, energy, etc., varies drastically between a child and an adult (Lyakso et al., 2015). The acoustic models made for research and business requirement thus become ineffective in case the emotional utterances belong to different age group. Speaker’s age and gender have been addressed by different pieces of literature during the last decades, although these studies little emphasized on emotional contents of the speech (Feld, Burkhardt, & Müller, 2010; Porat, Lange, & Zigel, 2010). These authors attempted the Gaussian weight super-vectors with support vector machine (SVM) classifier for age and gender identification. However, no precise study among different age groups or their emotional states has been made by them. Use of Mel-frequency Cepstral Coefficient (MFCC) with different feature selection algorithms such as PCA (principal component analysis), supervised PCA (SPCA) has been attempted for different age groups in (Chaudhari & Kagalkar, 2015). The prominent prosodic features representing speech emotion of children and adults could not be found in these pieces of literature. The absence of a clear boundary among emotions based on age has motivated the authors to move in this novel effort.

The objective is to cluster the features representing emotional utterances of different age groups. Different clustering approaches such as fuzzy c-means (FCM), hierarchical clustering, Partitioning, Density-Based, Grid-Based, Model-Based, K-means clustering has been applied to recognize human emotions (Kaur & Vashish, 2013; Trabelsi, Ayed, & Ellouze, 2016). The authors have compared the classification accuracy of these clustering methods using different emotional states. FCM has provided an accuracy of 63.97% for SROL emotional database using the statistical parameter as reported by the authors (Zbancioc & Ferarua, 2012). Among these techniques, two approaches as K-means and FCM have been applied in this work for clustering emotional speech utterances of different age groups. Speech emotions such as boredom, sadness, and anger have been chosen and analyzed separately. K-means is a hard-clustering algorithm, simple and can solve known clustering problems using unsupervised learning. The algorithm is faster than hierarchical clustering producing tighter clusters. FCM remains a compatible classifier for recognition of patterns that have overlapped clustering. It is suitable when the features of a pattern are associated with different clusters. However, meager works in recognition of speech emotion using FCM has been a motivating factor for the authors to choose this technique.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing