Recent Trends in Speech Recognition Systems

Recent Trends in Speech Recognition Systems

R. K. Aggarwal (National Institute of Technology Kurukshetra, India) and M. Dave (National Institute of Technology Kurukshetra, India)
DOI: 10.4018/978-1-4666-0954-9.ch006
OnDemand PDF Download:


Ways of improving the accuracy and efficiency of automatic speech recognition (ASR) systems have been a long term goal of researchers to develop the natural language man machine communication interface. In widely used statistical framework of ASR, feature extraction technique is used at the front-end for speech signal parameterization, and hidden Markov model (HMM) is used at the back-end for pattern classification. This chapter reviews classical and recent approaches of Markov modeling, and also presents an empirical study of few well known methods in the context of Hindi speech recognition system. Various performance issues such as number of Gaussian mixtures, tied states, and feature reduction procedures are also analyzed for medium size vocabulary. The experimental results show that using advanced techniques of acoustic models, more than 90% accuracy can be achieved. The recent advanced models outperform the conventional methods and fit for HCI applications.
Chapter Preview

1. Introduction

Human computer interaction through natural language conversational interface plays an important role in improving the usage of computers for the common man. The success of such speech enabled man machine communication interface depends mainly upon the performance of automatic speech recognition system. State-of-the-art ASR systems use statistical pattern classification approach, having the two well known phases: feature extraction and pattern classification.

In the architecture of ASR, feature extraction phase comes under front-end, that converts the recorded waveform to some form of acoustic representation known as feature vectors. Back-end covers the different statistical models such as acoustic models and language models, along with searching methods and adaptation techniques for classification. The features are based on time-frequency representation of acoustic signals, which are computed at regular intervals (e.g., every 10ms). The feature vectors are decoded into linguistic units like word, syllable, and phones with the help of hidden Markov models (HMMs) at back-end. For classification, HMMs use either multivariate Gaussian mixtures or artificial neural networks, to emit a state dependent likelihood or posterior probability on a frame by frame basis.

This chapter reviews and compares the existing statistical techniques (i.e., various types of HMMs) which have been used for acoustic-phonetic modeling of ASR in the context of Hindi language. The stochastic models are covered within three categories: conventional techniques, refinements and recently proposed methods. Various experiments are performed in normal field conditions as well as in noisy environments by using well known tools HTK 3.4.1 (Cambridge, University, 2011) and MATLAB. The deficiency in resources like speech and text corpora is the major hurdle in speech research for Hindi or any other Indian language. Unfortunately no standard database for Hindi language is available for public use till yet. Since databases from non-Indian languages cannot be used for Hindi (owing to the language specific effects), we have used self developed corpus which includes documents from popular Hindi news papers. The system includes PLP and PLP-RASTA techniques for feature extraction at front-end.

Rest of the chapter is organized as follows: Section 2 presents the role of speech recognition in HCI with ASR architecture and working. Classical approach of acoustic phonetic modeling is discussed in Section 3. In acoustic modeling, structure of HMM, discrete and continuous type of HMM, modeling unit of HMM, and pronunciation adaptation are covered. Section 4 presents the refinements (variable duration HMM and discriminative techniques) and advancements of HMM such as large margin and soft margin (based on support vector machines), dual stream approach and HMM with wavelet networks proposed by various researchers to overcome the limitations of standard HMM. Feature extraction and reduction techniques are covered in Section 5. In Section 6 ASR challenges and optimization are explained. Experimental results are analyzed in Section 7. Finally conclusions are drawn in Section 8.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Uma Shanker Tiwary, Tanveer J. Siddiqui
Chapter 1
Pradipta Biswas
This chapter presents a brief survey of different user modelling techniques used in human computer interaction. It investigates history of... Sample PDF
A Brief Survey on User Modelling in Human Computer Interaction
Chapter 2
Uma Shanker Tiwary, Tanveer J. Siddiqui
The objective of this chapter is twofold. On one hand, it tries to introduce and present various components of Human Computer Interaction (HCI), if... Sample PDF
Working Together with Computers: Towards a General Framework for Collaborative Human Computer Interaction
Chapter 3
Hung-Pin Hsu
In recent years, Metaverse has become a new type of social network. It provides an integrated platform and interactive environment for users to... Sample PDF
Interactive and Cognitive Models in the Social Network Environment for Designing
Chapter 4
Ilham N. Huseyinov
The purpose of this chapter is to explore fuzzy logic based methodology for computing an adaptive interface in an environment of imperfect, vague... Sample PDF
Fuzzy Linguistic Modelling in Multi Modal Human Computer Interaction: Adaptation to Cognitive Styles using Multi Level Fuzzy Granulation Method
Chapter 5
Navarun Gupta, Armando Barreto
The role of binaural and immersive sound is becoming crucial in virtual reality and HCI related systems. This chapter proposes a structural model... Sample PDF
Improving Audio Spatialization Using Customizable Pinna Based Anthropometric Model of Head-Related Transfer Functions
Chapter 6
R. K. Aggarwal, M. Dave
Ways of improving the accuracy and efficiency of automatic speech recognition (ASR) systems have been a long term goal of researchers to develop the... Sample PDF
Recent Trends in Speech Recognition Systems
Chapter 7
Tanveer J. Siddiqui, Uma Shanker Tiwary
Spoken dialogue systems are a step forward towards the realization of human-like interaction with computer-based systems. This chapter focuses on... Sample PDF
Issues in Spoken Dialogue Systems for Human- Computer Interaction
Chapter 8
Omar Farooq, Sekharjit Datta
The area of speech recognition has been thoroughly researched during the past fifty years; however, robustness is still an important challenge to... Sample PDF
Enhancing Robustness in Speech Recognition using Visual Information
Chapter 9
Sanjoy Pratihar, Partha Bhowmick
Describing the shape of an object is a well-studied, yet ever-engrossing problem, because an appropriate description can improve the efficiency of a... Sample PDF
On Applying the Farey Sequence for Shape Representation in Z2
Chapter 10
Armin Mustafa, K.S. Venkatesh
This chapter aims to develop an ‘accessory-free’ or ‘minimum accessory’ interface used for communication and computation without the requirement of... Sample PDF
Multi Finger Gesture Recognition and Classification in Dynamic Environment under Varying Illumination upon Arbitrary Background
Chapter 11
Rashid Ali, M. M. Sufyan Beg
Metasearching is the process of combining search results of different search systems into a single set of ranked results which, in turn, is expected... Sample PDF
Human Computer Interaction for Effective Metasearching
Chapter 12
Michal Ptaszynski, Jacek Maciejewski, Pawel Dybala, Rafal Rzepka, Kenji Araki, Yoshio Momouchi
Emoticons are string of symbols representing body language in text-based communication. For a long time they have been considered as unnatural... Sample PDF
Science of Emoticons: Research Framework and State of the Art in Analysis of kaomoji-type Emoticons
Chapter 13
David Griol, Zoraida Callejas, Ramón López-Cózar, Gonzalo Espejo, Nieves Ábalos
Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for... Sample PDF
On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces
Chapter 14
Andrew Molineux, Keith Cheverst
In recent years, vision recognition applications have made the transition from desktop computers to mobile phones. This has allowed a new range of... Sample PDF
A Survey of Mobile Vision Recognition Applications
Chapter 15
Khaled Necibi, Halima Bahi, Toufik Sari
Speech disorders are human disabilities widely present in young population but also adults may suffer from such disorders after some physical... Sample PDF
Speech Disorders Recognition using Speech Analysis
About the Contributors