Music Emotion Recognition-Based Business-Oriented Visualization Framework Using AI-driven Serverless Cloud Computing

Music Emotion Recognition-Based Business-Oriented Visualization Framework Using AI-driven Serverless Cloud Computing

Muhammed Golec (Queen Mary University of London, UK), Lifeng Zhu (Queen Mary University of London, UK), Emir Sahin Hatay (University of Essex, UK), Han Wang (Queen Mary University of London, UK), and Sukhpal Singh Gill (Queen Mary University of London, UK)
Copyright: © 2025 |Pages: 26
DOI: 10.4018/IJBAN.373258
Article PDF Download
Open access articles are freely available for download

Abstract

This paper proposes a novel framework for a real-time music visualization system designed for the hearing-impaired, utilizing AI and serverless computing. The system converts audio signals into visual representations that capture both the physical and emotional aspects of music. A neural network-based Music Emotion Recognition (MER) model extracts emotional cues, which are integrated into the visualizations. The serverless computing ensures accessibility, while an account management system and comment collection system enable customization and regular retraining of the model for better accuracy. Results demonstrate the framework's effectiveness, highlighting the scalability and cost-efficiency of serverless computing. This work significantly advances music accessibility for the hearing-impaired, enhancing sensory experiences and promoting mental well- being. The MER model shows superior performance, with a 46.8% lower Root Mean Squared Error (RMSE) compared to other works targeting the same 10-second audio fragment length and a 13.5% higher Pearson's correlation coefficient (PCC) for 30-second fragments.
Article Preview
Top

1. Introduction

The World Health Organization (WHO) estimates that around 1.3 billion people, or 16% of the global population, live with significant disabilities (World Health Organization [WHO], 2024). About 43 million of them are blind (International Agency for the Prevention of Blindness, 2020) and 430 million of them are deaf (World Health Organization [WHO], 2024). Facilities for the disabled are nowadays lively developed and extremely commonly deployed, bringing more convenience into their daily life, like blind sidewalk, braille button and sound traffic light. But physical difficulties aside, the mental health of the disabled has always been a significant problem that should not be ignored (Salako, I., 2017). Therefore, convenient facilities and equipment targeting their mortal life are also wealth to be attached importance to, like helping the blind to visit a museum or the deaf to attend a concert. Deafness and hearing loss are found worldwide in all regions and countries. Approximately one-fifth of the global population are living with hearing loss and 430 million of them are suffering disabling hearing loss (World Health Organization [WHO], 2024). It is expected that there could be over 700 million people with disabling hearing loss by 2050 (International Agency for the Prevention of Blindness, 2020). 34 million children have deafness or hearing loss and about 30% of people over 60 have hearing loss (World Health Organization [WHO], 2024). Besides, by the growth of age, the ability of hearing low magnitude or high frequency voice will degenerate significantly, (Oh, Lee, Park, Kim, Chung, Kim, & Yeo, 2014) which will also impact the ability to sense the music.

Employing medical technique is one of the solutions. Equipment like bone-anchored hearing devices (Baker, S., Centric, A., & Chennupati, S. K., 2015) can be deployed by some specific surgery procedures. But on one hand, implants have been proved not to be able to provide enough assist to the hearing impacted caused by all the reasons (Lupo, J. E., Biever, A., & Kelsall, D. C., 2020), on the other hand, the cost of implant surgery is not easy for every family to afford. Another family of solutions to help people with hearing loss to enjoy music is to transform the music into other communication modalities more familiar to deaf people, which shows more potential for higher availability and expansibility with the support of rapidly developing advancements.

Music information can be roughly separated into two properties, physical and perceptual. The physical part consists of amplitude, frequency spectrum and timbre, while the perceptual part represents the emotion it carries. To achieve the target of passing this information to the deaf, to extract these characteristics from the audio steam is the key. The most well-known physical feature representing system is pitch + magnitude + duration, which is utilized in the famous Musical Instrument Digital Interface (MIDI) filespec (MIDI Manufacturers Association., 2023). In the MER field, Russell (Russell, J. A., 1980) proposed circumplex model consists of two dimensions: valence and arousal, which is then adapted and improved by Thayer (Thayer, R., 1989), forming the widely used model of valence and arousal dimensions. Valence shows the pleasure level of the mood (displeasure to pleasure) and arousal shows the excited degree of it (clam to excited).

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2025)
Volume 11: 1 Issue (2024)
Volume 10: 1 Issue (2023)
Volume 9: 6 Issues (2022): 4 Released, 2 Forthcoming
Volume 8: 4 Issues (2021)
Volume 7: 4 Issues (2020)
Volume 6: 4 Issues (2019)
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 4 Issues (2015)
Volume 1: 4 Issues (2014)
View Complete Journal Contents Listing