A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms

A Web-Based Multimedia Retrieval System with MCA-Based Filtering and Subspace-Based Learning Algorithms

Chao Chen (Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA), Tao Meng (Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA) and Lin Lin (Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, USA)
DOI: 10.4018/jmdem.2013040102
OnDemand PDF Download:
No Current Special Offers


The popularity of research on intelligent multimedia services and applications is motivated by the high demand of the convenient access and distribution of pervasive multimedia data. Facing with abundant multimedia resources but inefficient and rather old-fashioned keyword-based retrieval approaches, Intelligent Multimedia Systems (IMS) demand on (i) effective filtering algorithms for storage saving, computation reduction, and dynamic media delivery; and (ii) advanced learning methods to accurately identify target concepts, effectively search personalized media content, and enable media-type driven applications. Nowadays, the web based multimedia applications become more and more popular. Therefore, how to utilize the web technology into multimedia data management and retrieval becomes an important research topic. In this paper, the authors developed a web-based intelligent video retrieval system that integrates effective and efficient MCA-based filtering and subspace-based learning to facilitate end users to retrieve their desired semantic concepts. A web-based demo shows the effectiveness of the proposed intelligent multimedia system to provide relevant results of target semantic concepts retrieved from TRECVID video collections.
Article Preview

1. Introduction

With the rapid development of communication platforms, the advancement of digital record techniques, and the decrease of the storage cost, it has become much easier for people to access, collect, and distribute multimedia data nowadays. The intelligent multimedia services and applications in Intelligent Multimedia Systems (IMS) enable the users to utilize the multimedia data for entertainment, remote education, commerce/business, social communication, navigation, security/surveillance, etc. with cell phones or computers. However, traditional text-based/keyword-based retrieval frameworks fail to efficiently and precisely retrieve the objects, concepts, and/or events of interest to the users. The capacity of multimedia data becomes larger and larger, which requires heavy human effort for the annotation, indexing, as well as performance evaluation of the retrieved results. Recently, IMS have been utilized to reduce human involvement in multimedia indexing and retrieval from a plethora of multimedia data (Kuper et al., 2003; Tseng et al., 2008). Furthermore, IMS have motivated the researchers to devote themselves into the area of concept-based video retrieval (Lew, Sebe, Djeraba, & Jain, 2006; Shyu, Chen, Sun, & Yu, 2007; Snoek & Worring, 2008; Meng & Shyu, 2012; Meng & Shyu, 2013). Some of IMS have been utilized in real-world applications. For example, IBM’s Query By Image and Video Content (QBIC) system (Flickner et al., 1995) has been used on the Hermitage Web site which uses the QBIC engine to search archives of world-famous art by color. Virage video engine (Hampapur et al., 1997) has been used by the Autonomy company. The Rich Media Management product automatically extracts and forms a conceptual and contextual understanding of the key concepts found in rich media assets to deliver advanced analytic, automatic categorization, summarization, concept clouds, dynamic content associations, content hyper-linking, as well as business process and workflow optimization. In Hori and Aizawa (2003), a context-based video retrieval system for the life-log application was proposed, which continuously captures data from a wearable camera, a microphone, a brain-wave analyzer, or a GPS receiver for video browsing and retrieval. The Cortina System for large-scale content-based web image retrieval was integrated into the network processors to manage and deliver broadband router and gateway functions, content sharing and streaming, network storage, etc. (Quack, Monich, Thiele & Manjunath, 2004). An event-based sports video retrieval system was proposed in Tjondronegoro, Chen and Joly (2008), which utilizes semi-schema-based indexing scheme on top of an object-relationship approach to make the system scalable and extensible. In addition, the ALIPR (Automatic Linguistic Indexing of Pictures - Real Time) system computerizes image tagging in real-time based on statistical modeling, learning, and wavelet transforms (Li & Wang, 2008). There are 46 image retrieval systems introduced in (Veltkamp & Tanase, 2000). One famous example is WebSEEk (Smith & Chang, 1997), a content-based image and video catalog and search tool for the World Wide Web. The system makes an effort to catalog the visual information on the World Wide Web and includes over 650, 000 images and videos from several sites. CIRES (content-based image retrieval system) is an image similarity search engine (Iqbal & Aggarwal, 2002); the system is able to serve queries ranging from scenes of purely natural objects such as vegetation, trees, and skies to images containing conspicuous structural objects such as buildings, towers, and bridges. VDBMS (Video Database Management System) is a data management system for advancing video database research (Aref et al. 2002); it supports comprehensive and efficient database management for digital video databases, including feature-based pre-processing for video content representation and indexing, video and meta-data storage, video query processing, buffer and storage management, and continuous video streaming. Moreover, XQuery is used to generate dynamic queries and user-oriented summaries. Also, there are over 20 retrieval systems have been developed for TRECVID videos (Smeaton, Over & Kraaij, 2006); one of the more interesting systems is ForkClient from MediaMill team at the University of Amsterdam (Snoek, Koelma & Smeulders, 2007). The ForkClient visualizes results by displaying keyframes based on the shape of a fork. The contents of the tines of the fork depend on the shot at the top of the stem. The center tine shows unseen query results, the left and right tines illustrate the time thread, and the two tines in between show user assignable threads. For the TRECVID benchmark, two variants of visual similarity threads are displayed; the stem of the fork displays the history thread. Each displayed key frame is taken from a single video shot, which can also be played on demand by rapidly displaying up to 16 frames in sequence from the originating shot. This aids in rapidly answering queries containing explicit reference to motions or to events.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 12: 4 Issues (2021): 2 Released, 2 Forthcoming
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing