Understanding User-Curated Playlists on Spotify: A Machine Learning Approach

Understanding User-Curated Playlists on Spotify: A Machine Learning Approach

Martin Pichl (University of Innsbruck, Innsbruck, Austria), Eva Zangerle (University of Innsbruck, Innsbruck, Austria) and Günther Specht (University of Innsbruck, Innsbruck, Austria)
DOI: 10.4018/IJMDEM.2017100103
OnDemand PDF Download:


Music streaming platforms enable people to access millions of tracks using computers and mobile devices. However, users cannot browse manually millions of tracks to find music they like. Building recommender systems suggesting music fitting the current context of a user is a challenging task. A deeper understanding for the characteristics of user-curated playlists naturally contributes to more personalized recommendations. To get a deeper understanding of how users organize music nowadays, we analyze user-curated playlists from the music streaming platform Spotify. Based on the audio features of the tracks, we find an explanation of differences in the playlists using a PCA and are able to group playlists using spectral clustering. Our findings about playlist characteristics can be exploited in a SVD-based music recommender system and our proposed clustering approach for finding groups of similar playlists is easy to integrate into a recommender system using pre- or post-filtering techniques.
Article Preview


In the last decade, new technologies have paved way for new distribution channels for digital content (e.g., music streaming platforms like Spotify1 or Apple Music2). At the same time, mobile devices as smartphones or tablets enable their users to access millions of tracks on those streaming platforms in various situations throughout the whole day. These developments make music organization a highly interesting topic: the challenge for the users is to find music they like in the overwhelming variety of music offered by music streaming platforms. In principle, users need to navigate through their music collection to find the music they aim to listen to during different activities or situations (Kamalzadeh, Baur, & Möller, 2012). In order to assist users in browsing these possibly extensive collections, streaming platforms heavily rely on recommender systems, but also on human editors. A deeper understanding for the characteristics of playlists, in particular how users curate their playlists can naturally contribute to more personalized and better recommendations.

In the field of music listening behavior analyses and recommender systems, social media platforms are exploited to gather relevant data for such analyses. Nowadays, a substantial number of people share what they are listening to at the moment using so-called #nowplayling tweets on Twitter. This makes Twitter, which is the world's leading micro-blogging platform serving 320 million active users3, a valuable data source. Twitter has already been exploited for various analyses of user listening behavior (Hauger, Schedl, Košir, & Tkalčič, 2013; Zangerle, Pichl, Gassler, & Specht, 2014) as well as for recommender systems (Pichl, Zangerle, & Specht, 2015; Schedl & Schnitzer, 2014; Zangerle, Gassler, & Specht, 2012). Earlier, automatic playlist generation, as a form of music recommendation, was studied intensively (Alghoniemy & Tewfik, 2001; Aucouturier & Pachet, 2002; Flexer, Schnitzer, Gasser, & Widmer, 2008; Logan, 2002; Pampalk, Pohle, & Widmer, 2005; Pauws & Eggen, 2002). In their analysis of user data derived from WebJay, a former web-based playlist service, Slaney and White (2006) found that people prefer different types of music and that users create playlists biased to these types of music. Furthermore, Cunningham et al. (2004) have shown that people categorize music after the intended use. Complementary to this, Kamalzadeh et al. (2012) found that people categorize music by activities and/or the mood in their music libraries. However, a problem for the general applicability of those qualitative studies is the small dataset in terms of users, playlists and listened tracks. In order to overcome this data problem, we exploit a recently published dataset of Spotify users (Pichl, Zangerle, & Specht, 2016). This dataset enables a profound quantitative analysis of the musical attributes of the tracks forming up different playlists.

In contrast to the well-researched field of automatic playlist generation, we aim to deepen our understanding for the characteristics of playlists created by human users and hence, shift our focus from automatic playlist generation to the analysis of playlists. To conduct this study, we require a data set containing information about users and their playlists. In a previous analysis we found that a substantial portion of so-called #nowplaying tweets refer to Spotify (Pichl, Zangerle, & Specht, 2014). In this work, we exploit a data set containing the subset of the Spotify users of the #nowplaying dataset and their playlists (Pichl et al., 2016). In total, we base our analyses on 1,137 users and their 18,296 playlists. We are particularly interested in studying the musical attributes of the tracks forming up different playlists. Therefore, we utilize the Echo Nest acoustical attributes contained in the dataset. Our analyses based on this data set are particularly driven by the following research questions (RQ):

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing