Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools

Multimodal Information Fusion of Audiovisual Emotion Recognition Using Novel Information Theoretic Tools

Zhibing Xie (Ryerson Multimedia Research Lab, Ryerson University, Toronto, Canada) and Ling Guan (Ryerson Multimedia Research Lab, Ryerson University, Toronto, Canada)
DOI: 10.4018/ijmdem.2013100101
OnDemand PDF Download:
List Price: $37.50
10% Discount:-$3.75


This paper aims at providing general theoretical analysis for the issue of multimodal information fusion and implementing novel information theoretic tools in multimedia application. The most essential issues for information fusion include feature transformation and reduction of feature dimensionality. Most previous solutions are largely based on the second order statistics, which is only optimal for Gaussian-like distribution, while in this paper we describe kernel entropy component analysis (KECA) which utilizes descriptor of information entropy and achieves improved performance by entropy estimation. The authors present a new solution based on the integration of information fusion theory and information theoretic tools in this paper. The proposed method has been applied to audiovisual emotion recognition. Information fusion has been implemented for audio and video channels at feature level and decision level. Experimental results demonstrate that the proposed algorithm achieves improved performance in comparison with the existing methods, especially when the dimension of feature space is substantially reduced.
Article Preview


The effective utilization of information fusion is becoming an increasingly essential issue in numerous areas, since it can achieve more reliable results by analyzing multiple data sources, extracted features, and intermediate decisions (Joshi, Dey & Samanta, 2009; Ross & Jain, 2003). Recognition results based on single modality are usually far from satisfactory due to insufficient data. On the other hand, the potential capability of multimodal fusion to eliminate limitations of single modality leads to a growing interest in research area. Since different sensors may carry redundant, complementary, or even conflict information, utilizing useful data and eliminating conflict information lead to improved overall system performance.

However the benefits of information fusion usually come with certain challenges, with the main obstacles lie in the identification of dissimilar characteristics between multiple modalities, and the selection of optimal fusion algorithms (Guan, Wang & Zhang, 2010). Multiple modalities are usually captured in various formats at various rates. Different data sources often have different levels of confidence and reliability. Moreover, the independence and correlation of different modalities equally provide valuable insight under different scenarios. Therefore, the fusion process needs to treat the above issues properly. Many research efforts have been made in improving the techniques of information fusion.

Multimodal fusion are categorized into four levels including data level, feature level, score level, and decision level (Atrey, Hossain & Saddik, 2010; Shivappa, Trivedi & Rao, 2010). Data level fusion refers to the combination of raw data from several sensors. It is not widely utilized due to the incompatibility of different modalities. Feature level fusion refers to the integration of different feature vectors. Although the feature level contains rich message about the raw data, it is difficult to obtain intrinsic correlation between heterogeneous features. Score level fusion refers to the combination of matching scores provided by different modalities. Its advantages are simple implementation and scalability. Decision level fusion refers to the combination of decisions from separate classifiers. Decision level fusion allows flexibility in the choice of individual classifiers, while it loses correlation information.

In practical application, it is highly possible that the extracted data are incomplete and imprecise due to heterogeneous measurement of different modalities. Hence, integrating complementary data and eliminating redundant information are essential before classification is implemented. Among numerous methods, the most popular linear strategies are linear discriminant analysis (LDA), principal component analysis (PCA), canonical correlation analysis (CCA), cross-modal factor analysis (CFA), etc (Lazaridis, Axenopoulos & Rafailidis, 2013). One of the widely used approaches is canonical correlation analysis (CCA) which realizes linear dimensionality reduction and fusion by computing maximally correlated linear projections (Shin & Park, 2011). Unlike CCA, cross-modal factor analysis (CFA) is a novel method to represent the coupled patterns between two different subsets of features through cross-modal association. CFA provides feature selection capability in addition to feature dimension reduction (Wang, Guan & Venetsanopoulos, 2011 July). Recently, there has been extensive interest in non-linear feature transformation. Instead of assuming linear relationship, kernel method is proposed to obtain non-linear correlation among the original data, which leads to kernel PCA (Schölkopf, Smola & Müller, 1997), kernel CCA (Xu & Mu, 2007) and kernel CFA (Wang, Guan & Venetsanopoulos, 2011 May).

  • IGI Global’s Seventh Annual Excellence in Research Journal Awards
    IGI Global’s Seventh Annual Excellence in Research Journal AwardsHonoring outstanding scholarship and innovative research within IGI Global's prestigious journal collection, the Seventh Annual Excellence in Research Journal Awards brings attention to the scholars behind the best work from the 2014 copyright year.

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2023): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing