Modeling and Synthesis of Realistic Visual Speech in 3D
Gregor A. Kalberer (BIWI-Computer Vision Lab, Switzerland), Pascal Müller (BIWI – Computer Vision Lab, Switzerland) and Luc Van Gool (BIWI – Computer Vision Lab, Switzerland)
Copyright: © 2004
The problem of realistic face animation is a difficult one. This is hampering a further breakthrough of some high-tech domains, such as special effects in the movies, the use of 3D face models in communications, the use of avatars and likenesses in virtual reality, and the production of games with more subtle scenarios. This work attempts to improve on the current state-of-the-art in face animation, especially for the creation of highly realistic lip and speech-related motions. To that end, 3D models of faces are used and — based on the latest technology — speech-related 3D face motion will be learned from examples. Thus, the chapter subscribes to the surging field of image-based modeling and widens its scope to include animation. The exploitation of detailed 3D motion sequences is quite unique, thereby narrowing the gap between modeling and animation. From measured 3D face deformations around the mouth area, typical motions are extracted for different “visemes”. Visemes are the basic motion patterns observed for speech and are comparable to the phonemes of auditory speech. The visemes are studied with sufficient detail to also cover natural variations and differences between individuals. Furthermore, the transition between visemes is analyzed in terms of co-articulation effects, i.e., the visual blending of visemes as required for fluent, natural speech. The work presented in this chapter also encompasses the animation of faces for which no visemes have been observed and extracted. The “transplantation” of visemes to novel faces for which no viseme data have been recorded and for which only a static 3D model is available allows for the animation of faces without an extensive learning procedure for each individual.