Expressing Musical Features, Class Labels, Ontologies, and Metadata Using ACE XML 2.0

Expressing Musical Features, Class Labels, Ontologies, and Metadata Using ACE XML 2.0

Cory McKay (McGill University, Canada) and Ichiro Fujinaga (Marianopolis College, Canada)
DOI: 10.4018/978-1-4666-2497-9.ch003


This chapter includes a critical review of existing file formats that have been used in MIR research. This is followed by a set of design priorities that are proposed for use in developing new formats and improving existing ones. The details of the ACE XML specification are then described in this context. Finally, research priorities for the future are discussed, as well as possible uses for ACE XML outside the specific domain of MIR.
Chapter Preview


Music Information Retrieval and Automatic Music Classification

Music Information Retrieval, or MIR (Downie, 2003), is a research domain that investigates theoretical and practical issues relating to the extraction of information of any kind from music and to making this information accessible. MIR has a broad scope that includes symbolic musical data (musical scores, MIDI files, etc.), audio data (MP3 recordings, AIFF recordings, etc.) and cultural data (sales statistics, album art, etc.).

MIR researchers often make use of machine learning and data mining algorithms (Duda, Hart, & Stork, 2001; Witten & Frank, 2005) to extract meaningful information from music. This typically involves having computers automatically classify music in some way. For example, machine learning can be used to automatically classify music by title, genre, performer, composer, mood, geographical origin, etc. Other MIR tasks such as automatic pitch recognition, chord identification, key finding, melodic segmentation, tempo tracking, instrument identification, and structural segmentation also often make use of automatic classification technology.

Musical machine learning typically involves the following basic tasks:

  • The collection of a musical dataset that can be used to “train” machine learning algorithms. This dataset is typically (although not always) annotated with labels, called “classes,” that are considered to be reliably correct. Class labels are domain-specific, and can specify information such as song titles, genre names, etc. Such annotated datasets are called “ground truth.” Each individual piece of data comprising the dataset (e.g., an audio recording, a musical score, an album art image, etc.) is called an “instance.”

  • Before training can actually be performed, the dataset must first have “features” extracted from it. Features are characteristic pieces of information of any kind that are believed to be potentially useful in discriminating between classes. Examples include low-level signal processing oriented features such as Spectral Flux, higher-level musical information such as the Range of Melodic Arcs, perceptually oriented information such as MFCCs, and cultural features such as Yearly Sales Volume. Sometimes “dimensionality reduction” algorithms are used to automatically determine which features, or which components of features, are the most likely to be useful for a given classification task.

  • Training is then performed, whereby the machine learning algorithms use various methods to learn to associate particular feature patterns to particular classes. The ground truth is typically partitioned into disjoint “training,” “testing,” and sometimes “publication” sets in order to help evaluate the reliability of learned models.

The effective implementation of musical machine learning requires well thought out representations of musical information, as this information must be provided to algorithms in effective ways, and it must also be possible to interpret the output of the algorithms in musically meaningful ways. Consequently, the particular file formats that are used in MIR can have an immense impact on the kinds of research that may be performed and on the quality of the results.

The overall efficiency of MIR research hinges on the ability of researchers to share data effectively with one another. Information such as ground-truth annotations, for example, can be very expensive to produce, and a great deal of repeated effort is avoided if researchers are able to share such information easily. Similarly, training and testing datasets themselves can be expensive to acquire, and since they cannot typically be distributed directly because of legal copyright limitations, the ability to share feature values can be very valuable. The ability to communicate metadata about features, instances, and classes can also be very useful.

Well-defined, flexible, and expressive standardized file formats are essential for distributing such information efficiently and effectively. The absence of such standardized formats poses a serious obstacle to the sharing of research information, with the result that each lab has a greater tendency to generate its own in-house data and file formats, which results in both wasteful repeated effort and, in general, lower quality data.

Complete Chapter List

Search this Book: