Visual Speech Perception, Optical Phonetics, and Synthetic Speech

Visual Speech Perception, Optical Phonetics, and Synthetic Speech

Lynne E. Bernstein (House Ear Institute, USA) and Jintao Jiang (House Ear Institute, USA)
Copyright: © 2009 |Pages: 23
DOI: 10.4018/978-1-60566-186-5.ch015


The information in optical speech signals is phonetically impoverished compared to the information in acoustic speech signals that are presented under good listening conditions. But high lipreading scores among prelingually deaf adults inform us that optical speech signals are in fact rich in phonetic information. Hearing lipreaders are not as accurate as deaf lipreaders, but they too demonstrate perception of detailed optical phonetic information. This chapter briefly sketches the historical context of and impediments to knowledge about optical phonetics and visual speech perception (lipreading). The authors review findings on deaf and hearing lipreaders. Then we review recent results on relationships between optical speech signals and visual speech perception. We extend the discussion of these relationships to the development of visual speech synthesis. We advocate for a close relationship between visual speech perception research and development of synthetic visible speech.
Chapter Preview

Impediments To Knowledge About Optical Phonetics And Visual Speech Perception

One impediment to knowledge about visible speech is the presupposition that, “A relatively small proportion of the information in speech is visually available” (Kuhl & Meltzoff, 1988, p. 240). We would not be surprised if researchers were not drawn to study a signal with little intrinsic information value: The impoverished visual speech stimulus might not deserve the same rigorous approach as that which has been applied to the acoustic speech signal. Indeed, in general, visible speech signals afford inherently fewer phonetic cues than audible signals, particularly, when listening conditions are favorable. Many of the activities of the vocal tract (Catford, 1977) that contribute to the acoustic speech signal (Stevens, 1998) are hidden from view. Paradoxically, the listener has greater access to vocal tract shape and activity than does the viewer, because the acoustic waveform is affected by all of the vocal tract settings.

For example, the hidden actions of the velum are critical to the control of nasality (e.g., the distinction between /b/ with the velum raised versus /m/ with the velum lowered). The hidden actions of the glottis contribute to voicing distinctions (e.g., in pre-vocalic /b/, for which glottal vibration is initiated earlier, versus /p/). The degree to which the tongue causes air flow restriction is only partly visible and affects manner of articulation (e.g., glide consonants, such as /w/, have less restricted air flow versus stop consonants, such as /b/). The position and shape of the tongue are also responsible for the different vowels (Catford, 1977). Nevertheless, research has shown that the talking face is a rich source for phonetic information.

Complete Chapter List

Search this Book: