Article Preview
TopIntroduction
Research in emotion detection has received great attention and often considered as one of the priority research topics. Researchers have applied emotion detection analysis to understand how emotions influence the human decision-making process in various areas such as safe driving (De Nadai et al., 2016), affective computing (Picard, 2000), speech emotion recognition (Schuller, 2018), stress monitoring (Yoon et al., 2016), health care (Guo et al., 2013), teaching (Williams et al., 2012), and cybersecurity (Åhäll & Gregory, 2013; Diamond & Hicks, 2005). For example, researchers identified that stress and depression could be major factors affecting human health (Cummins et al., 2015a). Since negative emotions such as sad, angry, and fearful can increase stress and depression, it is critical to detect those emotions properly so that people can control (or manage) their stress and depression. Numerous approaches have been proposed to recognize the emotions, but it is still difficult to identify their unique characteristics computationally. This is mainly because of the limited availability of emotion datasets and high emotional variability among people. It is also challenging to design a standardized approach for analyzing distinctive emotional patterns of people.
Physical or physiological signals are commonly used to understand human emotions. The physical signals include facial expression (Li et al., 2019), speech (Schuller, 2018), and gesture (García-Magariño et al., 2019). Video camera and audio recorders are often used to capture facial expressions and voice narrations (e.g., speaking or singing). The physiological signals indicate the data captured from autonomic nervous systems (e.g. galvanic skin response, heart rate, and temperature). Different approaches have been proposed to capture the physiological signals such as electroencephalogram (EEG) (Soroush et al., 2019), electrocardiogram (ECG) (Agrafioti et al., 2012), skin temperature (Kamioka et al., 2019). Although numerous studies have been performed to find the relationship among emotional changes using both (or either) approaches, recognizing emotions has not been fully studied. In this paper, we focus on extracting features based on wavelet transformation and analyze physical signals (i.e., speech and song) to understand human emotions. Specifically, both speech and song data are utilized to extract novel features for identifying their distinctive differences. Also, visualization approaches are applied to enhance the ability to represent and understand the unique characteristics of different emotions. To highlight the effectiveness of our proposed approach, performance evaluation was conducted with different classification algorithms.
This paper consists of six sections. It begins with describing related work. Then, we provide a detailed explanation of our proposed approach and the emotion dataset in the Methodology section. After explaining the results of our studies, we present the implications of this study and future work.