Audiovisual Integration of Natural Auditory and Visual Stimuli in the Real-World Situation

Xiaoyu Tang (Okayama University, Japan & Northeast Normal University, China), Yulin Gao (Okayama University, Japan), Weiping Yang (Okayama University, Japan), Ming Zhang (Northeast Normal University, China) and Jinglong Wu (Okayama University, Japan)
Bimodal audiovisual (AV) stimuli are detected or discriminated faster and more accurately than either visual or auditory unimodal stimuli. This effect is called audiovisual integration. Recently, researchers have been increasingly focused on the audiovisual integration of natural, auditory, and visual stimuli in real-world situations. There are some differences between audiovisual integration of naturalistic stimuli and non-naturalistic stimuli, such as the time of occurrence of audiovisual integration, and the neural mechanism. Factors affecting audiovisual integration in real-world situations are summarized here, with particular focus on temporal asynchrony and semantic matching. Stimuli of audiovisual integration in the real-world situation should be controlled strictly, especially emotional factors, familiarity factors, semantic matching, and the match of the naturalistic stimuli and non-naturalistic stimuli. In the future, researchers should study the influence of attention on audiovisual integration and the mechanism of audiovisual integration with naturalistic stimuli in the real-world situation.
Traditionally, researchers have used simple non-natural stimuli, such as a colored square and pure tone, to investigate the mechanisms of audiovisual integration. Audiovisual integration could occur at an early stage (40ms after stimuli onset) and/or a relatively late stage (160ms) (Giard & Peronnet, 1999; Talsma & Woldorff, 2005). Early multisensory interactions occur primarily when stimuli in at least one modality are low in intensity (Daniel Senkowski, Saint-Amour, Höfle, & Foxe, 2011). However, those studies cannot draw conclusions about possible multisensory interactions in early evoked brain responses to stimuli with different intensity levels across modalities, such as a low intensity auditory stimulus paired with a high intensity visual stimulus. Selectively exogenous attention (bottom-up) could modulate multisensory integration, or the divided attention to auditory stimuli could be a prerequisite for early integration (Mozolic, Hugenschmidt, Peiffer, & Laurienti, 2008; Theeuwes, 1991; Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008). The role of endogenous attention (top-down) mechanisms in audiovisual integration processing has been discussed (Talsma, Senkowski, Soto-Faraco, & Woldorff, 2010). Generally speaking however, there are many unsolved problems, such as the occurring stage or influence of selective and divided attention to multisensory integration; researches are still controversial now (Koelewijn, Bronkhorst, & Theeuwes, 2010; Mozolic, et al., 2008; Schroeder & Foxe, 2005; Stein & Stanford, 2008).

