In this chapter, we introduce alternative ways to access digital audio collections. We give an overview of existing applications based on tow-dimensional, map-like representations of music collections. Further, we explain two applications for accessing audio files that are based on the Self-Organising Map, an unsupervised neural network model. These two applications—PlaySOM and PocketSOM—will be explained in greater detail, paying special attention to their unique properties and implementations for several mobile devices. These examples are supposed to gain the readers’ interest for alternative interfaces to large audio collections. Besides, we hope to show that alternative interfaces are feasible for both desktop computers and mobile devices and offer a practical approach to pressing issues in accessing digital collections.
Digital Libraries Of Audio Collections
An increasing number of users adapts new technologies like MP3 players and manages their audio collection digitally. Not only the wide availability of personal audio devices such as the Apple iPod™ drives the increasing private use of digital media files (i.e., audio and, more recently, video files); also, the music industry starts adapting new distribution channels, and at the same time, an increasingly large user base is buying their music online. This makes the need for advanced methods to browse and search for music a more pressing matter than ever. The tremendous demand for feasible means of navigating through ever-growing numbers of digital entities by a rising number of users and providers clearly shows the great potential new and more sophisticated approaches could hold.
While text-based searches for artist and track names, as well as browsing through collections that are hierarchically structured according to artist, album and track categories constitute the de-facto standard for accessing music collections on PCs and mobile devices. New means for visualising and exploring large audio collections are being developed based on automatic analysis of the acoustic content of the audio files. Different visualisations have been proposed, many using some two-dimensional landscape to map music files on. Most of them incorporate some kind of clustering (i.e., mapping from a high-dimensional feature space to a usually two-dimensional output space). A particularly interesting effect of using such unsupervised learning techniques is the potential to overcome problems stemming from manually assigned genre tags, since they may not suit every user or may simply be wrong (Pachet & Cazaly, 2000). Approaches offer varying interaction possibilities like drawing trajectories on the map, selecting the music underneath, or marking regions on a map, which are discussed in detail in this chapter.
A disc and rectangle visualisation used to display and manipulate playlists was proposed in (Torrens, Hertzog, & Arcos, 2004). The disc visualisation gives a better visual idea about the proportions within the collection, whereas zooming was more useable with the rectangle visualisation.
Several teams have been working on user interfaces based on the Self-Organising Map (SOM). The SOM is an unsupervised neural network that provides a topology-preserving mapping from a high-dimensional feature space onto a two-dimensional map in such a way that data points close to each other in input space are mapped onto adjacent areas of the output space. The SOM has been extensively used to provide visualisations of and interfaces to a wide range of data, including control interfaces to industrial processing plants (Kohonen, Oja, Simula, Visa, & Kangas, 1996) to access interfaces for digital libraries of text documents (Rauber & Merkl, 2003).
Creating a SOM-based interface for Digital Libraries of Music (i.e., the SOM-enhanced JukeBox (SOMeJB)) was first proposed in Rauber and Frühwirth (2001) with more advanced visualizations, as well as improved feature sets being presented in Pampalk, Rauber, and Merkl (2002) and Rauber, Pampalk, and Merkl (2003). Since then, several other systems have been created based on these principles, such as the MusicMiner (Mörchen, Ultsch, Nöcker, & Stamm, 2005), which uses an emergent SOM. A very appealing three-dimensional user interface is presented in Knees, Schedl, Pohle, and Widmer (2006), automatically creating a three-dimensional musical landscape via a SOM for small private music collections. Navigation through the map is done via a video game pad, and additional information like labelling is provided using Web data and album covers.
A mnemonic SOM (i.e., a Self-Organising Map of a certain shape other than a rectangle) is used to cluster the complete works of the composer Wolfgang Amadeus Mozart to create the Map of Mozart (Mayer, Lidy, & Rauber, 2006). The shape of the SOM is a silhouette of its composer, leading to interesting clusterings (like, for example, the accumulation of string ensembles in the region of Mozart’s right ear). An online demo is available at http://www.ifs.tuwien.ac.at/mir/mozart.
Key Terms in this Chapter
Audio Features: An abstract representation of pieces of digital music. Audio features are computed from the raw audio signal. Simple features are the number of zero crossings of the audio signal or its centroid (as, for example, defined in the MPEG7 standard). More sophisticated approaches, such as MP3-based features, rhythm Patterns, Rhythm Histograms, or statistical Spectrum Descriptors take into account, for instance, findings from psycho-acoustics.
Audio Streaming: Music cannot only be played from a local hard disk, but can also be “streamed” over networks (i.e., the playback of a file can start even when its download has not completely finished). This technique is also highly relevant for live streams (e.g., Internet radio).
Human-Computer Interaction (HCI): Studies the interaction between computers and human users. HCI deals with both software and hardware interfaces to computer systems.
Digital Audio: Music files available in digital form. “Lossy” compression—like MP3 or Ogg Vorbis—are the most widely available formats, since they require less disk space than “lossless” formats, such as the Wave format.
Information Visualisation: Is concerned with the visualisation of complex or very high-dimensional data. Its main goal is to provide an intuitive and/or simplified view on more complex issues.
Self-Organising Map: An unsupervised neural network model. Its main application is clustering of high-dimensional data onto two-dimensional maps for explorative data analysis and visualisation.
Mobile Device: Computer or multimedia devices other than desktop machines. In the context of portable audio devices, PDAs, MP3 players, but also mobile phones with playback capabilities, can be used to play music.
Music Information Retrieval (MIR): An area of Information Retrieval concerned with objects from the audio domain. Contrary to classic Information Retrieval, which deals with (text) documents in general, MIR deals with the analysis and retrieval of files from the music domain in audio (e.g., WAV, MP3) or symbolic (e.g., MIDI, scores) form.