Dimensional Music Emotion Recognition by Machine Learning

Dimensional Music Emotion Recognition by Machine Learning

Junjie Bai (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China & School of Instrument Science and Engineering, Southeast University, Nanjing, China), Lixiao Feng (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China), Jun Peng (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China), Jinliang Shi (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China), Kan Luo (School of Information Science and Engineering, Fujian University of Technology, Fuzhou, China), Zuojin Li (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China), Lu Liao (School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing, China) and Yingxu Wang (International Institute of Cognitive Informatics and Cognitive Computing (ICIC),Laboratory for Computational Intelligence, Denotational Mathematics and Software Science, Department of Electrical and Computer Engineering, Schulich School of Engineering and Hotchkiss Brain Institute, University of Calgary, Calgary, Canada & Information Systems Lab, Stanford University, Stanford, CA, USA)
DOI: 10.4018/IJCINI.2016100104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Music emotion recognition (MER) is a challenging field of studies that has been addressed in multiple disciplines such as cognitive science, physiology, psychology, musicology, and arts. In this paper, music emotions are modeled as a set of continuous variables composed of valence and arousal (VA) values based on the Valence-Arousal model. MER is formulated as a regression problem where 548 dimensions of music features were extracted and selected. A wide range of methods including multivariate adaptive regression spline, support vector regression (SVR), radial basis function, random forest regression (RFR), and regression neural networks are adopted to recognize music emotions. Experimental results show that these regression algorithms have led to good regression effect for MER. The optimal R2 statistics and VA values are 29.3% and 62.5%, respectively, which are obtained by the RFR and SVR algorithms in the relief feature space.
Article Preview

1. Introduction

Music is not only a form of art but also a language that expresses human emotions, inner modes, and affective information (Juslin, 2001; Rodriguez, Ramos, & Wang, 2012; Wilson, and Keil, 2001; Juslin, & Sloboda, 2001). It is generally believed that music cannot be composed, performed, or comprehended without affective cognition and involvement. Music expresses affective emotions including joy, happiness, annoyance, sadness, pain, etc. Aesthetics and cognitive science recognize music as an effective form of affective expression. However, music experience is a subjective behavior in the process of music creation or appreciation. Individuals may have different understanding of the same piece of music and different extend of affective emotional effects. Therefore, how the affective emotions of music are formally evaluated is a challenging problem in fields of musicology, esthetics, psychologists, and cognitive science (Juslin, 2001; Hallam, & Thaut,2008). At present, machine learning algorithm is the research hotspot (Wang, 2016, 2015, 2015), and the machine learning algorithms are widely adapted to the recognize music emotions (Yang et al., 2008; Bang et al.,2013; Mokhsin et al.,2014; Jens, and Sand et al., 2015; Chin, and Lin et al, 2013).

In 1980s, Russell and Thayer proposed the Valence - Arousal model for music emotion description which is widely accepted and used by musicologists, estheticians and psychologists (Russell, 1980). A 2D emotion plane is introduced in the dimensions of valence and arousal (VA). In AV plane, the horizontal axis is defined as valence values representing a positive or negative emotion. The vertical coordinate is defined as arousal values of exciting or calming. Both VA values are ranged in (-1, 1). In this measurement scope, a valence value closer to 1 means a higher and positive emotions, and vice versa. Similarly, a higher arousal value indicates a stronger emotional intensity, and vice versa.

For example, as shown in Figure 1, a happy feeling is an emotion of positive valence and highly arousal, while sad is an emotion of negative valence and low arousal. Therefore, any form of music emotion can be mapped to the AV plane as a certain point. This allows music emotions to be formally recognized by a pair of VA values (Thayer, 1989). This dimensional conceptualization of music emotions provides a simple, reliable, and understandable model for practical affective experiments and manipulations.

Figure 1.

The valance-arousal plane of music emotions

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2017): 3 Released, 1 Forthcoming
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing