Viseme Classifiers for Fine Discrimination
In the previous chapter, we have described an AdaBoost-HMM classifier to deal with the variations of a particular viseme appearing in different contexts. However, the method does not specifically deal with the fine discrimination of different phonemes which may be confusable.
In MPEG-4 multimedia standard the relationship between phonemes and visemes is a many-to-one mapping (Tekalp, 2000). For example, there are only subtle differences in change of mouth shape between phoneme productions of /f/ and /v/, and thus they are clustered into one viseme category. If there is a classifier that is able to distinguish the small difference between them, the accuracy of visual speech recognition will be greatly improved.
For training of the single-HMM classifier, the Baum-Welch training algorithm making use of Maximum Likelihood (ML) model (Rabiner, 1993) is popularly adopted. However, the parameters of the HMM are solely determined by the correct samples while the relationship between the correct samples and incorrect ones is not taken into consideration. The method, in its original form, is thus not developed for fine recognition. One solution to this problem is to adopt a training strategy that maximizes the mutual information. The method is referred to as Maximum Mutual Information (MMI) estimation (Bahl, 1986). It increases the a posteriori probability of the model corresponding to the training data, and thus the overall discriminative power of the HMM obtained is guaranteed. However, it is difficult to realize such a strategy and implementation of MMI estimation is tedious. A computationally less intensive metric and approach are desirable.
In this chapter, we describe two Hidden Markov Model based techniques to increase the discriminative power of visual speech recognition. We name the two techniques as the Maximum Separable Distance (MSD) training strategy (Dong, 2005), and the Two-channel training approach (Dong, 2005; Foo, 2003; Foo, 2002).
Organization of the Chapter
The organization of the chapter is as follows. The proposed new metric Maximum Separable Distance (MSD) is described in Section 2. The MSD HMM and the Two-channel HMM are presented in Sections 3 and 4 respectively. The concluding remark is given in Section 5. Some suggestions for future work are outlined in Section 6.