Creating emotionally sensitive machines will significantly enhance the interaction between humans and machines. In this chapter we focus on enabling this ability for music. Music is extremely powerful to induce emotions. If machines can somehow apprehend emotions in music, it gives them a relevant competence to communicate with humans. In this chapter we review the theories of music and emotions. We detail different representations of musical emotions from the literature, together with related musical features. Then, we focus on techniques to detect the emotion in music from audio content. As a proof of concept, we detail a machine learning method to build such a system. We also review the current state of the art results, provide evaluations and give some insights into the possible applications and future trends of these techniques.
Section 1. Music And Emotions: Emotion In Music And Emotions From Music
To study the relationship between music and emotion, we have to consider the literature from many fields. Indeed, relevant scientific publications about this topic can be found in psychology, sociology, neuroscience, cognitive science, biology, musicology, machine learning and philosophy. We focus here on works aiming to understand the emotional process in music, and to represent and model the emotional space. We also detail the main results regarding the pertinent musical features and how they can be used to describe and convey emotions.
Key Terms in this Chapter
Music Categorization: models consider that perceptual, cognitive or emotional states associated with music listening can be defined by assigning them to one of many predefined categories. Categories are a basic survival tool, in order to reduce the complexity of the environment as they assign different physical states to the same class, and make possible the comparison between different states. It is by means of categories that musical ideas and objects are recognized, differentiated and understood. When applied to music and emotion, they imply that different emotional classes are identified and used to group pieces of music or excerpts according to them. Music categories are usually defined by means of present or absent musical features.
Musical Features: are the concepts, based on musical theory, music perception or signal processing, that are used to analyze, describe or transform a piece of music. Because of that, they constitute the building blocks of any Music Information Retrieval system. They can be global for a given piece of music (e.g., key or tonality), or can be time-varying (e.g., energy). Musical features have numerical or textual values associated. Their similarities and differences make possible to build predictive models of more complex or composite features, in a hierarchical way.
Supervised Learning: is a machine learning technique to automatically learn by example. A supervised learning algorithm generates a function predicting ouputs based on input observations. The function is generated from the training data. The training data is made of input observations and wanted outputs. Based on these examples the algorithm aims to generalize properly from the input/ouput observations to unobserved cases. We call it regression when the ouput is a continuous value and classification when the ouput is a label. Supervised learning is opposed to unsupervised learning, where the outputs are unknown. In that case, the algorithm aims to find structures in the data. There are many supervised learning algorithms such as Support Vector Machines, Nearest Neighbors, Decision trees, Naïve Bayes or Artificial Neural Network.
Music Information Retrieval: (MIR) is an interdisciplinary science aimed to studying the processes, systems and knowledge representations required for retrieving information from music. This music can be in symbolic format (e.g., a MIDI file), in audio format (e.g. an mp3 file), or in vector format (e.g., a scanned score). MIR research takes advantage of technologies and knowledge derived from signal processing, machine learning, music cognition, database management, human-computer interaction, music archiving or sociology of music.
Personal Music Assistants: are technical devices, that help its user to find relevant music, provide the right music at the right time and learn his profile and musical taste. Nowadays mp3 players are the music personal assistants, with eventually access to a recommendation engine. Adding new technologies like the ability to detect emotions, sense the mood and movements of the user will makes these devices “intelligent” and able to find music that triggers particular emotions.
Support Vector Machine: (SVM), is a supervised learning classification algorithm widely used in machine learning. It is known to be efficient, robust and to give relatively good performances. In the context of a two-class problem in n dimensions, the idea is to find the “best” hyperplane separating the points of the two classes. This hyperplane can be of n-1 dimensions and found in the feature space, in that case it is a linear classifier. Otherwise, it can be found in a transformed space of higher dimensionality using kernel methods. In that case we talk about a non-linear classifier. The position of new observations compared to the hyperplane tells us in which class is the new input.
Music Dimensional Models: consider that perceptual, cognitive or emotional states associated with music listening can be defined by a position in a continuous multidimensional space where each dimension stands for a fundamental property common to all the observed states. Pitch, for example, is considered to be defined by a height (how high or low in pitch it is a tone) and a chroma (the note class it belongs to, i.e., C, D, E, etc.) dimension. Two of the most accepted dimensions for describing emotions were proposed by Russel (Russel 1980): valence (positive versus negative affect) and arousal (low versus high level of activation). This variety of dimensions could be seen as the different expressions of a very small set of basic concepts.