Hidden Markov Model Based Visemes Recognition, Part II: Discriminative Approaches

Hidden Markov Model Based Visemes Recognition, Part II: Discriminative Approaches

Say Wei Foo (Nanyang Technological University, Singapore) and Liang Donga (Nanyang Technological University, Singapore)
Copyright: © 2009 |Pages: 32
DOI: 10.4018/978-1-60566-186-5.ch012
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The basic building blocks of visual speech are the visemes. Unlike phonemes, the visemes are, however, confusable and easily distorted by the contexts in which they appear. Classifiers capable of distinguishing the minute difference among the different categories are desirable. In this chapter, we describe two Hidden Markov Model based techniques using the discriminative approach to increase the accuracy of visual speech recognition. The approaches investigated include Maximum Separable Distance (MSD) training strategy (Dong, 2005) and Two-channel training approach (Dong, 2005; Foo, 2003; Foo, 2002) The MSD training strategy and the Two-channel training approach adopt a proposed criterion function called separable distance to improve the discriminative power of an HMM. The methods are applied to identify confusable visemes. Experimental results indicate that higher recognition accuracy can be attained using these approaches than that using conventional HMM.
Chapter Preview
Top

Introduction

Viseme Classifiers for Fine Discrimination

In the previous chapter, we have described an AdaBoost-HMM classifier to deal with the variations of a particular viseme appearing in different contexts. However, the method does not specifically deal with the fine discrimination of different phonemes which may be confusable.

In MPEG-4 multimedia standard the relationship between phonemes and visemes is a many-to-one mapping (Tekalp, 2000). For example, there are only subtle differences in change of mouth shape between phoneme productions of /f/ and /v/, and thus they are clustered into one viseme category. If there is a classifier that is able to distinguish the small difference between them, the accuracy of visual speech recognition will be greatly improved.

For training of the single-HMM classifier, the Baum-Welch training algorithm making use of Maximum Likelihood (ML) model (Rabiner, 1993) is popularly adopted. However, the parameters of the HMM are solely determined by the correct samples while the relationship between the correct samples and incorrect ones is not taken into consideration. The method, in its original form, is thus not developed for fine recognition. One solution to this problem is to adopt a training strategy that maximizes the mutual information. The method is referred to as Maximum Mutual Information (MMI) estimation (Bahl, 1986). It increases the a posteriori probability of the model corresponding to the training data, and thus the overall discriminative power of the HMM obtained is guaranteed. However, it is difficult to realize such a strategy and implementation of MMI estimation is tedious. A computationally less intensive metric and approach are desirable.

In this chapter, we describe two Hidden Markov Model based techniques to increase the discriminative power of visual speech recognition. We name the two techniques as the Maximum Separable Distance (MSD) training strategy (Dong, 2005), and the Two-channel training approach (Dong, 2005; Foo, 2003; Foo, 2002).

Organization of the Chapter

The organization of the chapter is as follows. The proposed new metric Maximum Separable Distance (MSD) is described in Section 2. The MSD HMM and the Two-channel HMM are presented in Sections 3 and 4 respectively. The concluding remark is given in Section 5. Some suggestions for future work are outlined in Section 6.

Complete Chapter List

Search this Book:
Reset