Audio and Visual Speech Recognition Recent Trends

Audio and Visual Speech Recognition Recent Trends

Lee Hao Wei (Sunway University, Malaysia), Seng Kah Phooi (Sunway University, Malaysia) and Ang Li-Minn (Edith Cowan University, Australia)
DOI: 10.4018/978-1-4666-3958-4.ch002
OnDemand PDF Download:
No Current Special Offers


This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.
Chapter Preview


The complexity of speech recognition system often overwhelms infant researchers as various techniques and algorithms from different research fields are required for a system to work. In speech recognition there are various difficulty levels for speech synthesis which can be summarized starting with the simplest isolated words being the simplest, connected words, continuous speech and spontaneous speech being the most difficult to implement (Anusuya & Katti, 2009).

Complete Chapter List

Search this Book: