Automatic Talker Identification Using Optimal Spectral Resolution: Application in noisy environment and telephony

Automatic Talker Identification Using Optimal Spectral Resolution: Application in noisy environment and telephony

Siham Ouamour, Halim Sayoud, Mhania Guerti
DOI: 10.4018/978-1-60960-563-6.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Results show the importance of the high spectral resolution in noisy environment and telephonic bandwidth, while the current research works have always favoured the low resolution of 24 coefficients in such tasks. For example, the authors notice an improvement of about 11% on the identification score, since they increase the resolution from 24 to 48 MFSC, in the telephonic bandwidth.
Chapter Preview
Top

Speech Database

The speech database is extracted from TIMIT (Fisher, Zue, Bernstein & Pallet, 1986) and FTIMIT: TIMIT in which we only preserve the telephonic bandwith (Liu & Fu, 2007) corresponding to the 300-3400 Hz bandwidth (Magrin-Chagnolleau, Wilke, & Bimbot, 1996). There are 37 speakers: 22 males and 15 females. The approximate duration of an utterance is 9 s for the training and 7 s for the test. The recordings are done with a high quality microphone, at 16 bits and with a sampling frequency of 16 kHz.

A second investigation is made in noisy environment (Sayoud, 2003) and with three types of noise (Haque, Togneri & Zaknich, 2006; Hu & Loizou, 2007; Kim & Stern, 2006).

  • the Gaussian White Noise: GWN (Paninski, 2006),

  • the car noise (Jabloun & Enis Cetin, 1999),

  • the babble noise (Elhilali & Shamma, 2008).

These noises are added during the training and test (Sayoud, 2003) at the following rates:

  • 0 dB,

  • 6 dB,

  • 12 dB,

  • 18 dB,

  • 24 dB (without noise).

Complete Chapter List

Search this Book:
Reset