Applying Machine Learning in Optical Music Recognition of Numbered Music Notation

Applying Machine Learning in Optical Music Recognition of Numbered Music Notation

Fu-Hai Frank Wu (National Tsing Hua University, Hsinchu, Taiwan)
DOI: 10.4018/IJMDEM.2017070102
OnDemand PDF Download:


Although research of optical music recognition (OMR) has existed for few decades, most of efforts were put in step of image processing to approach upmost accuracy and evaluations were not in common ground. And major music notations explored were the conventional western music notations with staff. On contrary, the authors explore the challenges of numbered music notation, which is popular in Asia and used in daily life for sight reading. The authors use different way to improve recognition accuracy by applying elementary image processing with rough tuning and supplementing with methods of machine learning. The major contributions of this work are the architecture of machine learning specified for this task, the dataset, and the evaluation metrics, which indicate the performance of OMR system, provide objective function for machine learning and highlight the challenges of the scores of music with the specified notation.
Article Preview


Music notation is the most important media for encoding music to play and read, therefore many classical master pieces encoded in the staff notation, conventional western music notation (CWMN) (Rastall, 1983), are alive for hundreds of years. Nonetheless the notation is the most popular notation worldwide, other musical notations exist and are used in daily life. For example, the numbered music notation (Huang, 2008), named as ‘jiănpῠ’ in pinyin for Mandarin, is well accepted in Asian. The research focuses on the sheet music written in the notation. The authors develop the processing engines and build the whole ecosystem of Optical Music Recognition (OMR) and perform experiments. The ecosystem is comprised of dataset, evaluation metrics, groundtruth, OMR processing engines.

The major differences between the staff notation and the numbered musical notation are the types of musical glyphs, for example the representative staves and digits, respectively. To name a few of the differences for pitch, the note head position relative to the staffs is substituted by numbers {1, 2, 3, 4, 5, 6, 7} and the octave ‘.’ under or above digit (see blue box in Figure 1); the variety of rest notation is simplified by combination of note length glyphs and digit ‘0’. The note length (duration) of note head and flag is switched to the numbers of ‘-’, ‘.’, and ‘_’ related to note digits (see red circles in Figure 1). On the other hand, some of musical glyphs, for instances ties and Volta brackets, are common for each other.

The research dataset is consisted of first 110 music score manuscripts of a hands-on songbook for singing reference. Those manuscripts are divided into two parts: one is the training set with 100 pieces of short scores with the average of 74 digit notes per sheet of music; the other is the test set with 10 pieces of sheet music with 65 digit notes per sheet. In addition to the digit note, that sheet music includes dynamic notations, such as tie and slur, structural notations to indicate the flow of notes in sheet music and accidentals, flat or sharp, to modify the pitch of notes. Without surprising the number of accidentals and structural notations are quite few to be well studied. However, the system also includes them in the processing steps and evaluation metrics for completeness of notation recognition.

Nowadays sheet music has been digitized for preservation and distribution, but only image digitization could limit the possible applications of Music Information Retrieval (MIR). On the other hand, the symbolic representation, for example MIDI (MIDI association) and musicXML (musicXML), has encoded music into musical entities. With those musical symbols, musical computing devices, such as MIDI instruments, could parse and play music accordingly without human intervention. In order to interpret automatically, the OMR system provides an intelligent and convenient bridge to translate score image into symbolic presentation. While the literatures related to OMR of the numbered musical notation are hard to find, the authors only refer to the literatures of staff notations in terms of the challenge (Bainbridge & Bell, 2001), issue (Rebelo et al., 2012), evaluation (Byrd & Simonsen, 2015), implementation (Bellini et al., 2001; Pugin et al., 2007; Tardón et al., 2009; Coüasnon & Camillerapp, 1994; Fornés et al., 2005; Ng & Boyle, 1996; Pinto et al., 2011; Rebelo et al., 2011; Toyama et al., 2006), improvement with machine learning (Fujinaga, 1996; Pugin et al., 2007; Rossant & Bloch, 2007) for possible commonality between different notations.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing