Speech Recognition System Implementation of a Method Based on Wave Atom Transform and Frequency-Mel Cepstral Coefficients Using SVM

Speech Recognition System Implementation of a Method Based on Wave Atom Transform and Frequency-Mel Cepstral Coefficients Using SVM

Walid Mohamed (University of Orleans, Orleans, France) and Yosssra Ben Fadhel (University of Tunis El Manar, Tunisia)
DOI: 10.4018/978-1-6684-4945-5.ch009
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In the field of human-machine interaction, automatic speech recognition (ASR) has been a prominent research area since the 1950s. Single-word speech recognition is widely used in voice command systems, which can be implemented in various applications such as access control systems, robots, and voice-enabled devices. This study describes the implementation of a single-word speech recognition system using wave atoms transform (WAT) and frequency-mel cepstral coefficients (MFCC) on a Raspberry Pi 3 (RPi 3) board. The WAT-MFCC approach is combined with a support vector machine (SVM). The experiment was conducted on an Arabic word database, and the results showed that the proposed WAT-MFCC-SVM method is highly reliable, achieving a detection rate of 100% and a real-time factor (RTF) of 1.50.
Chapter Preview
Top

Introduction

Speech is a simpler means of communication for people to express their thoughts and feelings. In fact, using it as a means of controlling one's environment is usually tempting. This is why research in Automatic Speech Recognition (ASR) is intensifying and reproducible steps are underway. In fact, several studies have been carried out during the past decades to design an ideal speech recognition system that can understand single-word speech in real time from different speakers and different environments. I was. Nevertheless, achieving this ultimate goal is a continuing requirement for his recently developed ASR system. Additionally, this task is difficult due to the presence of large variations in the speech signal. B. The absence or absence of clear boundaries between words or phonemes and the presence of unwanted noise signals caused by the diversity of speakers and their environment (gender, speaking speed, speaking style, dialect (Norezmi et al., 2017).

There are many applications of ASR systems released to perform a variety of tasks, from the simplest to the most complex Home automation (Rolon-Heredia et al., 2019). Furthermore, the progress recorded in the ASR research field is positively impacting the lives of people with disabilities and the elderly by providing quality support.

There are various perspectives in the literature from which ASR tasks have been considered. Abushariah et al. (2023) discussed some of the challenges of ASR and also gave an overview of many known approaches. In fact, in this work the author considered two feature extraction techniques: Mel-Frequency Cepstrum Coefficients (MFCC) and Predictive Linear Coding Coefficients (LPC). Artificial Neural Networks (ANN), Hidden Markov Models (HMM), Dynamic Time Warping (DTW). Therefore, comparisons were made between many ASR systems based on extracted features and classification techniques. Moreover, many approaches have been cited in Labied et al. (2021) and used as techniques in both the preprocessing and feature extraction stages of ASR systems. In Kothandaraman et al. (2022), the authors presented different perspectives on the structure of ASR systems. In fact, they took into consideration that these systems consist of numerous processing layers. This is because it requires multiple components, leading to a large number of computations. Furthermore, he concludes that with careful selection of appropriate processing layers, the error rate of ASR can now be reduced. In Ibrahim et al. (2017), both ASR and Text-to-Speech (TTS) research areas were discussed by the authors. In the ASR section, we explored various aspects for classifying speech, such as: B. Cepstrum-based feature extraction techniques, data compression and HMM. We also discussed various ways to increase robustness to noise. Presented a discussion in the field of ASR from the perspective of pattern recognition.

Complete Chapter List

Search this Book:
Reset