Recently, multimedia applications have undergone explosive growth due to the monotonic increase in the available processing power and bandwidth. This incurs the generation of large amounts of media data that need to be effectively and efficiently organized and stored. While these applications generate and use vast amounts of multimedia data, the technologies for organizing and searching them are still immature. These data are usually stored in multimedia archives utilizing search engines to enable users to retrieve the required information. In this article, each of the above stages will be reviewed and expounded. Background, current research directions, and outstanding problems will also be discussed.
Recently, multimedia applications have undergone explosive growth due to the monotonic increase in the available processing power and bandwidth. This incurs the generation of large amounts of media data that need to be effectively and efficiently organized and stored. While these applications generate and use vast amounts of multimedia data, the technologies for organizing and searching them are still immature. These data are usually stored in multimedia archives utilizing search engines to enable users to retrieve the required information.
Searching a repository of data is a well-known important task whose effectiveness determines, in general, the success or failure in obtaining the required information. A valuable experience that has been gained by the explosion of the Web is that the usefulness of vast repositories of digital information is limited by the effectiveness of the access methods. In a nutshell, the above statement emphasizes the great importance of providing effective search techniques. For alphanumeric databases, many portals (Acuna, Marcos, Gomez, & Bussler, 2005) have become widely accessible via the Web. These portals use search engines that adopt keyword-based search models in order to access the stored information, but the inaccurate search results of these search engines is a known issue.
For multimedia data, describing unstructured information (such as video) using textual terms is not an effective solution because they cannot be uniquely described by a number of statements. That is mainly due to the fact that human opinions vary from one person to another (Tešic & Smith, 2006), so that two persons may describe a single image by totally different statements. Therefore, the highly unstructured nature of multimedia data renders keyword-based search techniques inadequate. Video streams are considered the most complex form of multimedia data because they contain almost all other forms such as images and audio in addition to their inherent temporal dimension.
One promising solution that enables searching multimedia data in general, and video data in particular, is the concept of content-based search and retrieval (Deb, 2005). The basic idea is to access video data by their contents—for example, using one of the visual content features. Realizing the importance of content-based searching, researchers have started investigating the issue and proposing creative solutions. Most of the proposed video indexing and retrieval prototypes have the following two major phases (Marques & Furht, 2002):
The database population phase consists of the following steps:
Shot Boundary Detection: The purpose of this step is to partition a video stream into a set of meaningful and manageable segments (Hanjalic, 2002), which then serve as the basic units for indexing.
Key Frames Selection: This step attempts to summarize the information in each shot by selecting representative frames that capture the salient characteristics of that shot.
Extracting Low-Level Features from Key Frames: During this step, some of the low-level spatial features (color, texture, etc.) are extracted in order to be used as indexes to key frames and hence to shots. Temporal and other features (e.g., object motion) are used also.
In the retrieval phase, a query is presented to the system that in turns performs similarity matching operations and returns similar data (if found) back to the user.
It is worth mentioning that a growing trend in current content-based retrieval systems is the application of contextual constraints to enrich those systems with additional metadata (Davis, King, Good, & Sarvas, 2004). The use of context makes video retrieval systems both content-based and context-based systems at the same time. Besides, context-based techniques try to improve the retrieval performance by using associate contextual information, other than those derived from the media content (Hori & Aizawa, 2003).