Temporal Dependency of Multisensory Audiovisual Integration

Temporal Dependency of Multisensory Audiovisual Integration

Jingjing Yang (Okayama University, Japan & Changchun University of Science and Technology, China), Qi Li (Changchun University of Science and Technology, China), Yulin Gao (Okayama University, Japan) and Jinglong Wu (Okayama University, Japan)
DOI: 10.4018/978-1-4666-2113-8.ch033
OnDemand PDF Download:
No Current Special Offers


In everyday life, our brains integrate various kinds of information from different modalities to perceive our complex environment. Spatial and temporal proximity of multisensory stimuli is required for multisensory integration. Many researches have shown that temporal asynchrony of visual-auditory stimuli can influence multisensory integration. However, the neural mechanisms of asynchrony inputs were not well understood. Some researchers believe that humans have a relatively broad time window, in which stimuli from different modalities and asynchronous inputs tends to be integrated into a single unified percept. Others believe that the human brain can actively coordinate the auditory and visual input so that we do not notice the asynchronous inputs of multisensory stimuli. This review focuses on the question of how the temporal factor affects the processing of audiovisual information.
Chapter Preview


We are constantly deluged with various kinds of information from multiple sensory organs. When we are using a computer, we must look at the screen, listen to the sound from the speaker, touch the mouse or keyboard, and so on. Information from different sensory organs is often efficiently merged to form a unified and robust percept. This process is referred to as multisensory integration (Lovelace,2002; Frassinetti,2003; Stein,2008). Figure 1 shows a simply example, the impact of a falling ball, that simultaneously generate multisensory information. In this event, the crashing ball not only reflects light to our eyes at the moment the ball strikes the ground but also creates air-borne vibrations and transmits them to our ear. Because of the different physical natures of these signals, neither has any effect on the other. Some of these physical signals can be transduced by the nervous system. The retina transduces light, and the cochlea transduces air-borne pressure waves into neural signals. The signals arrive at sensory-specific cortices to result in a distinct perception. Then, our neural system can automatically combine the neural signals from different sensory organs into a unified perception. Therefore, a ‘multisensory stimulus’ is actually, then, an event that generates several independent physical signals, each of which is simultaneously detectable by different types of sensory receptors (Meredith MA.2002).

Figure 1.

Multisensory audiovisual integration (from Qili and Sarah jinglong Wu,2010)


A typical example of the audiovisual interaction is the McGurk effect, which was first described in a paper by McGurk and MacDonald in 1976. When a video of one phoneme production is dubbed onto a sound recording of a different phoneme that is spoken, the perceived phoneme is a third, intermediate phoneme. For example, a visual /ga/ combined with an audio /ba/ is often heard as /da/. The McGurk effect demonstrates an interaction between hearing and vision in speech perception.

Such behavioral studies have shown, for example, that the simultaneous, or near-simultaneous, presentation of an auditory stimulus can influence the perceived temporal characteristics of a visual stimulus (Frassinetti,2002;Lippert,2007). Electrophysiological studies in nonhuman primates and other mammals have shown that sensory cues from different modalities that appear at the same time and in the same location can increase the ðring rate of multisensory cells in the superior colliculus(Wallace,1998; Meredith,2002). Converging evidence from human behavioral research has demonstrated that stimuli from two or more sensory modalities presented in close spatial and temporal proximity can have a facilitative effect. Specifically, multimodal stimuli lead to faster detection times and more accurate discrimination performance compared to unimodal stimuli. Spatially and temporally coincident audiovisual stimuli are detected more easily and more quickly than unimodal stimuli (Posner, 1980; Talsma,2005;Molholm,2002). Human attention system enables us to focus on task-relevant information and to ignore that which is irrelevant (Posner,1980;Carrasco,2004). However, task-irrelevant information cannot be completely ignored because humans must constantly monitor their environments.

Whether the auditory and visual stimuli are combined or not, it is relies on the stimulative accordance of temporal and spatial information (D.Talsma, 2005;Michael, 2007). In real-life, audiovisual stimuli onset asynchrony is ubiquitous phenomenon. When watching TV, We often see the mouth movement before hearing the sound. A flash of lightning, followed after some seconds by a rumble of thunder is the fact that sound travels significantly slower than light: 330 versus 300,000,000 meters per second.

Complete Chapter List

Search this Book: