Article Preview
Top1. Introduction
Managing multimedia databases requires the ability to retrieve meaningful information from the digital data, in order to help users find relevant multimedia data more effectively and to facilitate better ways of entertainment. Motivated by a large number of requirements and applications such as sport highlighters, movie recommenders, image search engines, and music libraries, multimedia retrieval and semantic detection have become very popular research topics in recent years (Lew, Sebe, Djeraba & Jain, 2006; Shyu, Chen, Sun & Yu, 2007; Snoek & Worring, 2008). The general steps for supervised content-based multimedia retrieval consist of the segmentation of the multimedia data (i.e., detecting the basic units for processing), the representation of the multimedia data (i.e., extracting low-level features per unit), the model training using the low-level features, and the classification of the testing data using the trained model.
The most frequently used features for image retrieval are low-level features such as color, texture, and shape (Datta, Joshi, Li & Wang, 2008); while for video retrieval, the features are these visual features as well as some low-level audio and motion features (Lew, Sebe, Djeraba & Jain, 2006). One of the biggest challenges of multimedia retrieval is that it is hard to bridge the semantic gaps between the low-level features and the high-level features/concepts. Traditionally, these low-level features are considered contributing equally to the models, and the models are trained by using all the features they are provided. Later, the models are required to have the ability to select the features that better represent a certain concept class. In this manner, the features are selected before the model training process, and hence the models do not necessary benefit from the feature selection process (Lin, Ravitz, Shyu, & Chen, 2008; Liu & Motoda, 1998). From another point of view, the importance of the features is not considered equally anymore, but is considered as “good” or “bad” while performing the feature selection.