Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition

Cédric Févotte

Source Title: Machine Audition: Principles, Algorithms and Systems

ISBN13: 9781615209194|ISBN10: 1615209190|ISBN13 Softcover: 9781616923693|EISBN13: 9781615209200

DOI: 10.4018/978-1-61520-919-4.ch011

MLA

Févotte, Cédric. "Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition." Machine Audition: Principles, Algorithms and Systems, edited by Wenwu Wang, IGI Global, 2011, pp. 266-296. https://doi.org/10.4018/978-1-61520-919-4.ch011

APA

Févotte, C. (2011). Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition. In W. Wang (Ed.), Machine Audition: Principles, Algorithms and Systems (pp. 266-296). IGI Global. https://doi.org/10.4018/978-1-61520-919-4.ch011

Chicago

Févotte, Cédric. "Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition." In Machine Audition: Principles, Algorithms and Systems, edited by Wenwu Wang, 266-296. Hershey, PA: IGI Global, 2011. https://doi.org/10.4018/978-1-61520-919-4.ch011

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Nonnegative matrix factorization (NMF) is a popular linear regression technique in the fields of machine learning and signal/image processing. Much research about this topic has been driven by applications in audio. NMF has been for example applied with success to automatic music transcription and audio source separation, where the data is usually taken as the magnitude spectrogram of the sound signal, and the Euclidean distance or Kullback-Leibler divergence are used as measures of fit between the original spectrogram and its approximate factorization. In this chapter the authorsgive evidence of the relevance of considering factorization of the power spectrogram, with the Itakura-Saito (IS) divergence. Indeed, IS-NMF is shown to be connected to maximum likelihood inference of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio. Furthermore, the statistical setting opens doors to Bayesian approaches and to a variety of computational inference techniques. They discuss in particular model order selection strategies and Markov regularization of the activation matrix, to account for time-persistence in audio. This chapter also discusses extensions of NMF to the multichannel case, in both instantaneous or convolutive recordings, possibly underdetermined. The authors present in particular audio source separation results of a real stereo musical excerpt.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Itakura-Saito Nonnegative Factorizations of the Power Spectrogram for Music Signal Decomposition

MLA

APA

Chicago

Export Reference

Abstract

Request Access