Multi-Channel Source Separation: Overview and Comparison of Mask-based and Linear Separation Algorithms

Multi-Channel Source Separation: Overview and Comparison of Mask-based and Linear Separation Algorithms

Nilesh Madhu (Ruhr-Universität Bochum, Germany) and André Gückel (Dolby Laboratories, Nürnberg, Germany)
Copyright: © 2011 |Pages: 39
DOI: 10.4018/978-1-61520-919-4.ch009
OnDemand PDF Download:
List Price: $37.50


Machine-based multi-channel source separation in real life situations is a challenging problem, and has a wide range of applications, from medical to military. With the increase in computational power available to everyday devices, source separation in real-time has become more feasible, contributing to the boost in the research in this field in the recent past. Algorithms for source separation are based on specific assumptions regarding the source and signal model – which depends upon the application. In this chapter, the specific application considered is that of a target speaker enhancement in the presence of competing speakers and background noise. It is the aim of this contribution to present not only an exhaustive overview of state-of-the-art separation algorithms and the specific models they are based upon, but also to highlight the relations between these algorithms, where possible. Given this wide scope of the chapter, we expect it will benefit both, the student beginning his studies in the field of machine audition, and those already working in a related field and wishing to obtain an overview or insights into the field of multi-channel source separation.
Chapter Preview

Separation Taxonomy

Multi-channel source separation algorithms rely on the spatial diversity afforded by microphone arrays to accomplish the goal of suppressing all sources other than the target source. They can be, broadly speaking, divided into two major categories: linear and non-linear.

Linear approaches to source separation attain target signal enhancement or interference cancellation by steering a spatial null in the direction of the interfering sources. The algorithms are based on a linear generative mixing model (the observed mixtures are assumed to be linear combinations of the individual source signals), whereupon null steering is obtained through a linear combination of the mixtures. Additional constraints may be placed on these algorithms to ensure that no target signal degradation takes place during the process. This process is also known as beamforming, and the corresponding spatial filters are termed as beamformers. Beamforming algorithms may further be sub-divided into two categories: those based explicitly on a spatial model and corresponding constraints, and those that depend upon specific long term statistical characteristics of the source signals. Examples of the former kind of approaches include the generalized sidelobe canceller (GSC) (Griffiths & Jim, 1982) and its variants, and are primarily based on the second order statistics (SOS) of the microphone signals. The independent component analysis (ICA) based algorithms of e.g. (Hyvärinen et al, 2001; Smaragdis, 1998; Saruwatari et al, 2006; Sawada et al, 2004) which use higher order statistics (HOS) of the signals, and the simultaneous-decorrelation based algorithms of e.g. (Fancourt & Parra, 2001; Buchner et al, 2004; Diamantaras, 2001; Ikram & Morgan, 2002) which are based on SOS are examples of the latter class of linear separation algorithms. Note that while these algorithms do not explicitly impose any spatial constraints, the separation filters form spatial exclusion regions along the interference directions – in effect functioning as beamformers. This aspect is examined in some more detail in (Araki et al, 2003; Parra & Fancourt, 2002).

Complete Chapter List

Search this Book: