Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

Hsin-Yu Ha (School of Computing and Information Sciences, Florida International University, Miami, FL, USA), Fausto C. Fleites (School of Computing and Information Sciences, Florida International University, Miami, FL, USA), and Shu-Ching Chen (School of Computing and Information Sciences, Florida International University, Miami, FL, USA)
DOI: 10.4018/jmdem.2013040103
OnDemand PDF Download:
No Current Special Offers


Nowadays, only processing visual features is not enough for multimedia semantic retrieval due to the complexity of multimedia data, which usually involve a variety of modalities, e.g. graphics, text, speech, video, etc. It becomes crucial to fully utilize the correlation between each feature and the target concept, the feature correlation within modalities, and the feature correlation across modalities. In this paper, the authors propose a Feature Correlation Clustering-based Multi-Modality Fusion Framework (FCC-MMF) for multimedia semantic retrieval. Features from different modalities are combined into one feature set with the same representation via a normalization and discretization process. Within and across modalities, multiple correspondence analysis is utilized to obtain the correlation between feature-value pairs, which are then projected onto the two principal components. K-medoids algorithm, which is a widely used partitioned clustering algorithm, is selected to minimize the Euclidean distance within the resulted clusters and produce high intra-correlated feature-value pair clusters. Majority vote is applied to subsequently decide which cluster each feature belongs to. Once the feature clusters are formed, one classifier is built and trained for each cluster. The correlation and confidence of each classifier are considered while fusing the classification scores, and mean average precision is used to evaluate the final ranked classification scores. Finally, the proposed framework is applied on NUS-wide Lite data set to demonstrate the effectiveness in multimedia semantic retrieval.
Article Preview

1. Introduction

As a result of the rapid improvement of contemporary technology, people usually have smartphones that easily allow capturing images and recording video and instantly sharing the multimedia content with corresponding descriptions with friends over social networks, a trend that has resulted in multimedia data propagating expeditiously around the world. A study presented by IDC and EMC stated that 1,800 EB (1 EB = 1,000 PB) of digital information were produced in 2011, and the amount of information increased ten times from 2005 to 2011 (Gantz et al., 2008). To manage the enormous volume of multimedia data, i.e., images, videos, texts, and audio, how to effectively retrieve data from different modalities and bridge the gap between low-level features and various semantic concepts becomes more and more essential. Many researchers have been investigating multi-modal fusion for multimedia analysis, e.g. video retrieval (Yan, Yang & Hauptmann, 2004; McDonald & Smeaton, 2005), speech recognition (Metallinou, Lee & Narayanan, 2010; Papandreou, Katsamanis, Pitsikalis & Maragos, 2009), event detection (Jiang et al., 2010; Mertens, Lei, Gottlieb Friedland & Divakaran, 2011), etc. However, because of the involved modalities, multi-modal fusion has many challenges: coping with different feature formats, capturing correlation and independence among modalities in many levels, and detecting the confidence level of each model in achieving tasks.

To resolve the above-mentioned challenges, Atrey et al. (2010) pointed out several questions for multimedia analysis; in particular some of them were comprehensively thought through for content-based multimedia retrieval:

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2023): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing