Automatic Speaker Localization and Tracking: Using a Fusion of the Filtered Correlation with the Energy Differential

Automatic Speaker Localization and Tracking: Using a Fusion of the Filtered Correlation with the Energy Differential

Siham Ouamour (USTHB University, Algeria), Halim Sayoud (USTHB University, Algeria) and Salah Khennouf (USTHB University, Algeria)
DOI: 10.4018/978-1-4666-0119-2.ch011
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This paper presents a system of speaker localization for a purpose of speaker tracking by camera. The authors use the information given by the two microphones, placed in opposition, to determine the position of the active speaker in trying to supervise the audio-visual recording. To achieve the speaker localization task, the authors have proposed and employed two methods, which are called respectively: the filtered correlation method and the energy differential method. The principle of the first method is based on the calculation of the correlation between the two signals collected by the two microphones and a special filtering. The second is based on the computation of the logarithmic energy differential between these two signals. However, when different methods are used simultaneously to make a decision, it is often interesting to use a fusion technique combining those estimations or decisions in order to enhance the system performances. For that purpose, this paper proposes two fusion techniques operating at the decision level which are used to fuse the two estimations into one that should be more precise.
Chapter Preview
Top

Speech Database

We have built four experimental databases with different scenarios, different speakers and different configurations:

  • DB8 database: the distance between the two microphones is 4.20 m.

  • DB9 database: the distance between the two microphones is 2 m.

  • DB10 database: the distance between the two microphones is 1 m.

  • DB11 database: the distance between the two microphones is 1 m.

In this paper, we will describe only the experiments done on DB11 database, since the results got with long distances (DB8 and DB9) are very affected by the echo effect, and those obtained on the DB10 are insufficient.

The DB11 database contains several scenarios with different speakers speaking alternatively in a natural manner and with different configurations. There are two general configurations: a stable configuration and a mobile configuration. In the stable configuration, the speakers are seated at one of the 3 fixed positions: Left, Middle or Right (Figure 1.a and Figure 1.b) in a same line. In the mobile configuration, the speaker walks smoothly from one side to the other (e.g., from the left to the right). The distance between the two microphones is 1m, the number of scenarios is 11and the number of speakers is 7 (4 female and 3 male speakers).

Complete Chapter List

Search this Book:
Reset