Multimedia Representation

Multimedia Representation

Bo Yang (Bowie State University, USA)
DOI: 10.4018/978-1-60566-014-1.ch135
OnDemand PDF Download:
$37.50

Abstract

In recent years, the rapid expansion of multimedia applications, partly due to the exponential growth of the Internet, has proliferated over the daily life of computer users (Yang & Hurson, 2006). The integration of wireless communication, pervasive computing, and ubiquitous data processing with multimedia database systems has enabled the connection and fusion of distributed multimedia data sources. In addition, the emerging applications, such as smart classroom, digital library, habitat/environment surveillance, traffic monitoring, and battlefield sensing, have provided increasing motivation for conducting research on multimedia content representation, data delivery and dissemination, data fusion and analysis, and contentbased retrieval. Consequently, research on multimedia technologies is of increasing importance in computer society. In contrast with traditional text-based systems, multimedia applications usually incorporate much more powerful descriptions of human thought—video, audio, and images (Karpouzis, Raouzaiou, Tzouveli, Iaonnou, & Kollias, 2003; Liu, Bao, Yu, & Xu, 2005; Yang & Hurson, 2005). Moreover, the large collections of data in multimedia systems make it possible to resolve more complex data operations such as imprecise query or content-based retrieval. For instance, the image database systems may accept an example picture and return the most similar images of the example (Cox, Miller, & Minka, 2000; Hsu, Chua, & Pung, 2000; Huang, Chang, & Huang, 2003). However, the conveniences of multimedia applications come with challenges to the existing data management schemes: • Efficiency: Multimedia applications generally require more resources; however, the storage space and processing power are limited in many practical systems, for example, mobile devices and wireless networks (Yang & Hurson, 2005). Due to the large data volume and complicated operations of multimedia applications, new methods are needed to facilitate efficient representation, retrieval, and processing of multimedia data while considering the technical constraints. • Semantic Gap: There is a gap between user perception of multimedia entities and physical representation/access mechanism of multimedia data. Users often browse and desire to access multimedia data at the object level (“entities” such as human beings, animals, or buildings). However, the existing multimedia retrieval systems tend to access multimedia data based on their lower-level features (“characteristics” such as color patterns and textures), with little regard to combining these features into data objects. This representation gap often leads to higher processing cost and unexpected retrieval results. The representation of multimedia data according to human’s perspective is one of the focuses in recent research activities; however, few existing systems provide automated identification or classification of objects from general multimedia collections. • Heterogeneity: The collections of multimedia data are often diverse and poorly indexed. In a distributed environment, because of the autonomy and heterogeneity of data sources, multimedia data objects are often represented in heterogeneous formats. The difference in data formats further leads to the difficulty of incorporating multimedia data objects under a unique indexing framework. • Semantic Unawareness: The present research on content-based multimedia retrieval is based on feature vectors—features are extracted from audio/video streams or image pixels, empirically or heuristically, and combined into vectors according to the application criteria. Because of the application-specific multimedia formats, the feature-based paradigm lacks scalability and accuracy.
Chapter Preview
Top

Introduction

Multimedia Information Processing: Promises and Challenges

In recent years, the rapid expansion of multimedia applications, partly due to the exponential growth of the Internet, has proliferated over the daily life of computer users (Yang & Hurson, 2006). The integration of wireless communication, pervasive computing, and ubiquitous data processing with multimedia database systems has enabled the connection and fusion of distributed multimedia data sources. In addition, the emerging applications, such as smart classroom, digital library, habitat/environment surveillance, traffic monitoring, and battlefield sensing, have provided increasing motivation for conducting research on multimedia content representation, data delivery and dissemination, data fusion and analysis, and content-based retrieval. Consequently, research on multimedia technologies is of increasing importance in computer society. In contrast with traditional text-based systems, multimedia applications usually incorporate much more powerful descriptions of human thought—video, audio, and images (Karpouzis, Raouzaiou, Tzouveli, Iaonnou, & Kollias, 2003; Liu, Bao, Yu, & Xu, 2005; Yang & Hurson, 2005). Moreover, the large collections of data in multimedia systems make it possible to resolve more complex data operations such as imprecise query or content-based retrieval. For instance, the image database systems may accept an example picture and return the most similar images of the example (Cox, Miller, & Minka, 2000; Hsu, Chua, & Pung, 2000; Huang, Chang, & Huang, 2003). However, the conveniences of multimedia applications come with challenges to the existing data management schemes:

  • Efficiency: Multimedia applications generally require more resources; however, the storage space and processing power are limited in many practical systems, for example, mobile devices and wireless networks (Yang & Hurson, 2005). Due to the large data volume and complicated operations of multimedia applications, new methods are needed to facilitate efficient representation, retrieval, and processing of multimedia data while considering the technical constraints.

  • Semantic Gap: There is a gap between user perception of multimedia entities and physical representation/access mechanism of multimedia data. Users often browse and desire to access multimedia data at the object level (“entities” such as human beings, animals, or buildings). However, the existing multimedia retrieval systems tend to access multimedia data based on their lower-level features (“characteristics” such as color patterns and textures), with little regard to combining these features into data objects. This representation gap often leads to higher processing cost and unexpected retrieval results. The representation of multimedia data according to human’s perspective is one of the focuses in recent research activities; however, few existing systems provide automated identification or classification of objects from general multimedia collections.

  • Heterogeneity: The collections of multimedia data are often diverse and poorly indexed. In a distributed environment, because of the autonomy and heterogeneity of data sources, multimedia data objects are often represented in heterogeneous formats. The difference in data formats further leads to the difficulty of incorporating multimedia data objects under a unique indexing framework.

  • Semantic Unawareness: The present research on content-based multimedia retrieval is based on feature vectors—features are extracted from audio/video streams or image pixels, empirically or heuristically, and combined into vectors according to the application criteria. Because of the application-specific multimedia formats, the feature-based paradigm lacks scalability and accuracy.

Key Terms in this Chapter

Latent Semantic Indexing: A technique of analyzing semantic relationships between a set of features and the concepts they contain by producing a set of concepts related to the features.

Elementary Entity: Data entities that semantically represent basic objects.

Summary Schemas Model: A content-aware organization prototype that enables imprecise queries on distributed heterogeneous data sources.

Canonical Correlation Analysis: A statistical approach of making sense of cross-covariance matrices.

Annotation: Descriptive text attached to multimedia objects.

Decision Rule: Automatically generated standards that indicate the relationship between multimedia features and content information.

Representative Region: Areas with the most notable characteristics of a multimedia object.

Cluster: The group of content-similar multimedia objects.

Feature Extraction: Mapping a multimedia object to features.

Semantic-Based Representation: Describing multimedia content using semantic terms.

Complete Chapter List

Search this Book:
Reset