Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition

Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition

Cédric Févotte (CNRS LTCI, TELECOM ParisTech, France)
Copyright: © 2011 |Pages: 31
DOI: 10.4018/978-1-61520-919-4.ch011
OnDemand PDF Download:
List Price: $37.50


Nonnegative matrix factorization (NMF) is a popular linear regression technique in the fields of machine learning and signal/image processing. Much research about this topic has been driven by applications in audio. NMF has been for example applied with success to automatic music transcription and audio source separation, where the data is usually taken as the magnitude spectrogram of the sound signal, and the Euclidean distance or Kullback-Leibler divergence are used as measures of fit between the original spectrogram and its approximate factorization. In this chapter the authorsgive evidence of the relevance of considering factorization of the power spectrogram, with the Itakura-Saito (IS) divergence. Indeed, IS-NMF is shown to be connected to maximum likelihood inference of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio. Furthermore, the statistical setting opens doors to Bayesian approaches and to a variety of computational inference techniques. They discuss in particular model order selection strategies and Markov regularization of the activation matrix, to account for time-persistence in audio. This chapter also discusses extensions of NMF to the multichannel case, in both instantaneous or convolutive recordings, possibly underdetermined. The authors present in particular audio source separation results of a real stereo musical excerpt.
Chapter Preview


Nonnegative matrix factorization (NMF) is a linear regression technique, employed for non-subtractive, part-based representation of nonnegative data. Given a data matrix V of dimensions F × N with nonnegative entries, NMF is the problem of finding a factorizationVWH(1) where W and H are nonnegative matrices of dimensions F × K and K × N, respectively. K is usually chosen such that FK + KN << FN, hence reducing the data dimension. Early works about NMF include (Paatero, 1997) and (Lee and Seung, 1999), the latter in particular prove very influential. NMF has been applied to diverse problems (such as pattern recognition, clustering, data mining, source separation, collaborative filtering) in many areas (such as text processing, bioinformatics, signal/image processing, finance). Much research about NMF has been driven by applications in audio, namely automatic music transcription (Smaragdis and Brown, 2003; Abdallah and Plumbley, 2004) and source separation (Virtanen, 2007; Smaragdis, 2007), where the data V is usually taken as the magnitude spectrogram of the audio signal.

Along Vector Quantization (VQ), Principal Component Analysis (PCA) or Independent Component Analysis (ICA), NMF provides an unsupervised linear representation of data, in the sense that a data point vn (nth column of V) is approximated as a linear combination of salient features. (see Table 1)

Table 1.
“explanatory variables”“regressors”,
“basis”, “dictionary”,“expansion coefficients”,
“patterns”“activation coefficients”

Complete Chapter List

Search this Book: