Instantaneous Versus Convolutive Non-Negative Matrix Factorization: Models, Algorithms and Applications to Audio Pattern Separation

Instantaneous Versus Convolutive Non-Negative Matrix Factorization: Models, Algorithms and Applications to Audio Pattern Separation

Wenwu Wang (University of Surrey, UK)
Copyright: © 2011 |Pages: 18
DOI: 10.4018/978-1-61520-919-4.ch015
OnDemand PDF Download:
List Price: $37.50


Non-negative matrix factorization (NMF) is an emerging technique for data analysis and machine learning, which aims to find low-rank representations for non-negative data. Early works in NMF are mainly based on the instantaneous model, i.e. using a single basis matrix to represent the data. Recent works have shown that the instantaneous model may not be satisfactory for many audio application tasks. The convolutive NMF model, which has an advantage of revealing the temporal structure possessed by many signals, has been proposed. This chapter intends to provide a brief overview of the models and algorithms for both the instantaneous and the convolutive NMF, with a focus on the theoretical analysis and performance evaluation of the convolutive NMF algorithms, and their applications to audio pattern separation problems.
Chapter Preview


Since the seminal paper published in 1999 by Lee and Seung, non-negative matrix factorization (NMF) has attracted tremendous research interests over the last decade. The earliest work in NMF is perhaps by (Paatero, 1997) and is then made popular by Lee and Seung due to their elegant multiplicative algorithms (Lee & Seung, 1999, Lee & Seung, 2001). The aim of NMF is to look for latent structures or features within a dataset, through the representation of a non-negative data matrix by a product of low rank matrices. It was found in (Lee & Seung, 1999) that NMF results in a “parts” based representation, due to the nonnegative constraint. This is because only additive operations are allowed in the learning process. Although later works in NMF may have mathematical operations that can lead to negative elements within the low-rank matrices, their non-negativity can be ensured by a projection operation (Zdenuk & Cichocki, 2007, Soltuz et al, 2008). Another interesting property with the NMF technique is that the decomposed low-rank matrices are usually sparse, and the degree of their sparseness can be explicitly controlled in the algorithm (Hoyer, 2004). Thanks to these promising properties, NMF has been applied to many problems in data analysis, signal processing, computer vision, and patter recognition, see, e.g. (Lee & Seung, 1999, Pauca et al, 2006, Smaragdis & Brown, 2003, Wang & Plumbley, 2005, Parry & Essa, 2007, FitzGerald et al, 2005, Wang et al, 2006, Zou et al, 2008, Wang et al, 2009, Cichocki et al, 2006b).

In machine audition and audio signal processing, NMF has also found applications in, for example, music transcription (Smaragdis & Brown, 2003, Wang et al, 2006) and audio source separation (Wang & Plumbley, 2005, Parry & Essa, 2007, FitzGerald et al, 2005, FitzGerald et al, 2006, Virtanen, 2007, Wang et al, 2009). In these applications, the raw audio data are usually transformed to the frequency domain to generate the spectrogram, i.e. the non-negative data matrix, which is then used as the input to the NMF algorithm. The instantaneous NMF model given in (Lee & Seung, 1999, Lee & Seung, 2001) has been shown to be satisfactory in certain tasks in audio applications provided that the spectral frequencies of the analyzed signal do not change dramatically over time (Smaragdis, 2004, Smaragdis, 2007, Wang, 2007, Wang et al, 2009). However, this is not a case for many realistic audio signals whose frequencies do vary with time. The main limitation with the instantaneous NMF model is that only a single basis function is used, and therefore is not sufficient to capture the temporal dependency of the frequency patterns within the signal. To address this issue, the convolutive NMF (or similar methods called shifted NMF) model has been introduced (Smaragdis, 2004, Smaragdis, 2007, Virtanen, 2007, FitzGerald et al, 2005, Morup et al, 2007, Schmidt & Morup, 2006, O’Grady & Pearlmutter, 2006, Wang, 2007, Wang et al, 2009). For the convolutive NMF, the data to be analyzed are modelled as a linear combination of shifted matrices, representing the time delays of multiple bases. Several algorithms have been developed based on this model, for example, the Kullback-Leibler (KL) divergence based multiplicative algorithm proposed in (Smaragdis, 2004, Smaragdis, 2007), the squared Euclidean distance based multiplicative algorithm proposed in (Wang, 2007, Wang et al, 2009), the two-dimensional deconvolution algorithms proposed in (Schmidt & Morup, 2006), the logarithmic scaled spectrogram decomposition algorithm in (FitzGerald et al, 2005), and the algorithm based on the constraints of the temporal continuity and sparseness of the signals in (Virtanen, 2007).

Complete Chapter List

Search this Book: