Machine Learning Techniques for Adaptive Multimedia Retrieval: Technologies Applications and Perspectives

Machine Learning Techniques for Adaptive Multimedia Retrieval: Technologies Applications and Perspectives

Chia-Hung Wei (Ching Yun University, Taiwan) and Yue Li (Nankai University, China)
Indexed In: SCOPUS
Release Date: October, 2010|Copyright: © 2011 |Pages: 408
DOI: 10.4018/978-1-61692-859-9
ISBN13: 9781616928599|ISBN10: 161692859X|EISBN13: 9781616928612
Hardcover:
Available
$180.00
TOTAL SAVINGS: $180.00
Benefits
  • Free shipping on orders $395+
  • Printed-On-Demand (POD)
  • Usually ships one day from order
  • 20% discount on 5+ titles*
E-Book:
(Multi-User License)
Available
$180.00
TOTAL SAVINGS: $180.00
Benefits
  • Multi-user license (no added fee)
  • Immediate access after purchase
  • No DRM
  • ePub with PDF download
  • 20% discount on 5+ titles*
Hardcover +
E-Book:
(Multi-User License)
Available
$215.00
TOTAL SAVINGS: $215.00
Benefits
  • Free shipping on orders $395+
  • Printed-On-Demand (POD)
  • Usually ships one day from order
  • Multi-user license (no added fee)
  • Immediate access after purchase
  • No DRM
  • ePub with PDF download
  • 20% discount on 5+ titles*
OnDemand:
(Individual Chapters)
Available
$37.50
TOTAL SAVINGS: $37.50
Benefits
  • Purchase individual chapters from this book
  • Immediate PDF download after purchase or access through your personal library
  • 20% discount on 5+ titles*
Description & Coverage
Description:

As the size of multimedia databases grows, retrieval becomes a key challenge in multimedia database management. Accordingly, it is necessary to apply machine learning techniques to automatically tune the mechanism of multimedia retrieval systems.

Machine Learning Techniques for Adaptive Multimedia Retrieval: Technologies Applications and Perspectives disseminates current information on multimedia retrieval, advances the field of multimedia databases, and educates the multimedia database community. It is a critical text for professionals who are engaged in efforts to understand machine learning techniques for adaptive multimedia retrieval research, design and applications.

Coverage:

The many academic areas covered in this publication include, but are not limited to:

  • Artificial Intelligence in Multimedia Databases Technologies
  • Content Processing, Analysis, Extraction, Synthesis, and Representation
  • Indexing, Searching, Retrieving, Querying, and Archiving Multimedia Databases
  • Metadata Generation, Coding and Transformation
  • Multimedia Database Integration and Query Languages
  • Multimedia for Interactive Services
  • Multimedia Security, Modeling, Coding, and Compression
  • Semantic Web and Ontology
  • Sequence Database Techniques, Indexing, and Approximate Matching
  • User Interaction and Relevance Feedback
Indices
Reviews and Testimonials

This book focuses on theories, methods, algorithms, and applications multimedia retrieval using machine learning techniques. The mission of this book is to disseminate state-of-the-art multimedia retrieval, advance the field of multimedia databases, and educate the multimedia database community. The individual chapters are contributed by different authors and present various solutions to the different kinds of problems concerning machine learning for multimedia retrieval. The prospective audience of the proposed book would be academics, scientists, practitioners and engineers who are engaged in efforts to understand the state of the art in multimedia retrieval research, design and applications. This book can also be used as a supplement in multimedia related courses for lecturers, upper-level undergraduates and graduate students. Moreover, fellow researchers and PhD students intending to broaden their scope or looking for a research topic in multimedia retrieval may find the book inspiring.

– Chia-Hung Wei, Ching Yun University, Taiwan; and Yue Li, Nankai University, China
Table of Contents
Search this Book:
Reset
Editor Biographies
Chia-Hung Wei is currently an assistant professor of the Department of Information Management at Ching Yun University, Taiwan. He obtained his Ph.D. degree in Computer Science from the University of Warwick, UK, and Master's degree from the University of Sheffield, UK, and Bachelor degree from the Tunghai University, Taiwan. His research interests include content-based image retrieval, digital image processing, medical image processing and analysis, machine learning for multimedia applications and information retrieval. He has published over 10 research papers in those research areas.
Yue Li received his Ph.D. degree in computer science from University of Warwick, UK, in 2009, M.S. in Information Technology from Department of Computer Science, University of Nottingham, UK, in 2005, and B.Sc. in Mathematics from Nankai University, China, in 2003. He is currently an assistant professor of Collage of Software, University of Nankai, China. He serves as a member of editorial review board of International Journal of Digital Crime and Forensics. His research interests include digital forensics, multimedia security, digital watermarking, pattern recognition, machine learning and content-based image retrieval.
Editorial Review Board
  • Qi Tian, University of Texas at San Antonio, USA
  • Alan Hanjalic, Delft University of Technology, The Netherlands
  • Stefano Berretti, University of Florence, Italy
  • Marcel Worring, University of Amsterdam, The Netherlands
  • George Ioannidis, University of Bremen, Germany
  • Mei-Ling Shyu, University of Miami, USA
  • Shu-Ching Chen, Florida International University, USA
  • Yixin Chen, University of Mississippi, USA
  • Min Chen, University of Montana, Missoula, USA
  • Qiang Cheng, Southern Illinois University, USA
  • Clement Leung, Victoria University, Australia
  • Remco Veltkamp, Utrecht University, The Netherlands
  • Xin-Jing Wang, Microsoft Research Asia
  • Mohan S Kankanhalli, National University of Singapore, Singapore
  • Jianping Fan, The University of North Carolina at Charlotte, USA
  • Zhongfei Zhang, State University of New York (SUNY) at Binghamton, USA
  • Man-Kwan Shan, National Chengchi University, Taiwan
  • Bill Grosky, University of Michigan – Dearborn, USA
Peer Review Process
The peer review process is the driving force behind all IGI Global books and journals. All IGI Global reviewers maintain the highest ethical standards and each manuscript undergoes a rigorous double-blind peer review process, which is backed by our full membership to the Committee on Publication Ethics (COPE). Learn More >
Ethics & Malpractice
IGI Global book and journal editors and authors are provided written guidelines and checklists that must be followed to maintain the high value that IGI Global places on the work it publishes. As a full member of the Committee on Publication Ethics (COPE), all editors, authors and reviewers must adhere to specific ethical and quality standards, which includes IGI Global’s full ethics and malpractice guidelines and editorial policies. These apply to all books, journals, chapters, and articles submitted and accepted for publication. To review our full policies, conflict of interest statement, and post-publication corrections, view IGI Global’s Full Ethics and Malpractice Statement.

Preface

Multimedia retrieval refers to a technology used to search for various types of digital multimedia information, such as texts, images, graphics, video, and audio, from multimedia databases. The technology makes the database users possible to locate desired multimedia information, analyze characteristics of multimedia sets, and discover knowledge hidden in vast amount of multimedia objects. As the size of multimedia databases grows, retrieval has been a key challenge in multimedia database management. The challenge in retrieving desired multimedia information comes from three aspects: The first aspect is the users' difficulty in specifying their information needs in the form of a predefined query. The second aspect is the problem of extracting semantics from the multimedia content. The third aspect is that user specific interests and the search context are usually neglected when objects are retrieved. To improve the performance of retrieval systems, it is necessary to apply machine learning methods to tune the mechanism of the retrieval system to the user’s information needs in the search process. Machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn like the human. Applying machine learning for multimedia retrieval makes retrieval systems more intelligent to sensibly tackle various problems and issues. Due to the importance of the field, a significant amount of research and efforts have been made around the world. 

This book focuses on theories, methods, algorithms, and applications multimedia retrieval using machine learning techniques. The mission of this book is to disseminate state-of-the-art multimedia retrieval, advance the field of multimedia databases, and educate the multimedia database community. The individual chapters are contributed by different authors and present various solutions to the different kinds of problems concerning machine learning for multimedia retrieval. The prospective audience of the proposed book would be academics, scientists, practitioners and engineers who are engaged in efforts to understand the state of the art in multimedia retrieval research, design and applications. This book can also be used as a supplement in multimedia related courses for lecturers, upper-level undergraduates and graduate students. Moreover, fellow researchers and PhD students intending to broaden their scope or looking for a research topic in multimedia retrieval may find the book inspiring.

This book includes 15 chapters, which are organized into four sections. The first section provides fundamental techniques for multimedia and solutions to specific applications in image retrieval. In the second section, another four chapters discusses semantic analysis, annotation, and knowledge discovery. The third section introduces and presents approaches to video analysis, indexing, and retrieval. In the last section, three chapters are included to represent music information analysis and retrieval. The importance of each chapter is briefly described as follows:

As the size of multimedia database grows, it becomes impractical to manually annotate all contents and attributes of the media, and the difficulty in finding desired information increases. To copy with these challenges, content-based multimedia retrieval systems have been developed for various applications. In Chapter 1 of this book, Chia-Hung Wei and Sherry Y Chen provide a conceptual architecture for the design of content-based retrieval system, and discuss essential components of retrieval system and their research issues, including feature extraction and representation, dimension reduction of feature vector, indexing, and query specifications.

Dmitry Kinoshenko et al. proposes a metric on partitions of arbitrary measurable sets and its special properties for metrical content-based image retrieval based on the ‘spatial’ semantic of images. The approach considers images represented in the form of nested partitions produced by any segmentations. Nested partitions representation expresses a degree of information refinement or roughening and so not only corresponds to rational content control but also ensures creation of specific search algorithms and synthesize hierarchical models of image search reducing the number of query and database elements match operations.

The study of content-based information retrieval has been focused on such approaches as search-by-association, aimed search, and category search. In information and multimedia retrieval, any retrieval scheme is based on a query matching. R* trees can be utilized to find those similar data points. In Chapter 3, Jiaxiong Pi et al. not only utilize R* trees to improve K-means and hierarchical clustering methods, but also extend R*-Tree’s application to cluster analysis for similarity retrieval of images.

In multimedia applications, 2D and 3D images can be used jointly to improve pattern recognition. In face recognition, Stefano Berretti et al. propose an original framework that performs recognition by using manifold embedding and machine learning techniques applied to the face representations extracted from 2D face images and from 3D face models. Objectives of Chapter 4 are twofold. On the one hand, an original approach based on the computation of radial geodesic distances (RGD) is proposed to represent two-dimensional (2D) face images and three-dimensional (3D) face models for the purpose of face recognition. On the other hand, face representations based on RGDs are used for the purpose of face identification by using them in an operative framework that exploits state-of-the-art techniques for manifold embedding and machine learning. 

With the rapid increase in the amount of registered trademarks around the world, trademark image retrieval systems have been developed throughout these years. However, some conventional approaches to feature extraction, such as moment invariants, Zernike moments, Fourier descriptors and curvature scale space descriptors, contain some major deficiencies when addressing the trademark image retrieval problem. In Chapter 5, Wing-Yin Chau et al. propose a novel approach in order to overcome the major deficiencies of the conventional approaches. The proposed approach combines the Zernike moments descriptors with the centroid distance representation and the curvature representation. In Chapter 5, the experimental results show that the proposed approach outperforms the conventional approaches in several circumstances. 

Visual information has been immensely used in various domains, such as education, health, and digital libraries, due to the advancements of computing technologies. Meanwhile, users realize that it has been more and more difficult to recognize visual content. Although traditional content-based retrieval systems allow users to access visual information through query-by-example with low level visual features (e.g. color, shape, and texture), the semantic gap is widely recognized as a hurdle for practical adoption of content-based retrieval systems. Rich visual information (e.g. user generated visual content) enables us to derive new knowledge at a large scale, which will significantly facilitate visual information management. Besides semantic concept detection, semantic relationship among concepts can also be explored in visual domain, other than traditional textual domain. In Chapter 6, Zhiyong Wang and Dagan Feng provide an overview of the state-of-the-arts on discovering semantics in visual domain from two aspects, semantic concept detection and knowledge discovery from visual information at semantic level. For the first aspect, various aspects of visual information annotation are discussed, including content representation, machine learning based annotation methodologies, and widely used datasets. For the second aspect, a novel data driven based approach is introduced to discover semantic relevance among concepts in visual domain. Future research topics are also outlined.  

With nearly twenty years of intensive study on the content-based image retrieval and annotation, the related topics still remain difficult. The essential challenge lies in the limitation of using low-level visual features to characterize the semantic information of images, commonly known as the semantic gap. To bridge this gap, various approaches have been proposed based on the incorporation of human knowledge and textual information as well as the learning techniques utilizing the information of different modalities. In addition, contextual information which represents the relationship between different real world/conceptual entities has shown its significance with respect to recognition tasks through real life experience and scientific studies. In Chapter 7, Zhang and Guan firstly review the state of the art of the existing works on image annotation and retrieval. Moreover, they propose a general Bayesian framework which integrates content and contextual information and apply it for image annotation and retrieval. The contextual information is considered as the statistical relationship between different images and different semantic concepts for image retrieval and annotation, respectively. The framework has efficient learning and classification procedures and the effectiveness is evaluated based on experimental studies, which demonstrate its advantage over both content-based and context-based approaches.

In Chapter 8, Zhang et al. present a highly scalable and adaptable co-learning framework on multimodal data mining in a multimedia database. The framework demonstrates a strong scalability in the sense that the query time complexity is a constant, independent of the database scale. The mining effectiveness is also independent of the database scale, allowing facilitating a multimodal querying to a very large scale multimedia database. In addition, this framework also shows a strong adaptability in the sense that it allows incrementally updating the database indexing with a constant operation when the database is dynamically updated with new information. Hence, this framework excels many of the existing multimodal data mining methods in the literature that are neither scalable nor adaptable at all. Theoretic analysis and empirical evaluations are provided to demonstrate the advantage of the strong scalability and adaptability. While this framework is general for multimodal data mining in any specific domains, to evaluate this framework, this study applies it to the Berkeley Drosophila ISH embryo image database for the evaluations of the mining performance. This study has compared the framework with a state-of-the-art multimodal data mining method to demonstrate the effectiveness and the promise of the framework.

Background knowledge has been investigated as a potential means to improve performance of machine learning algorithms. In Chapter 9, Taksa and Zelikovitz explore the use of machine learning for non-hierarchical classification of queries, and present an approach to background knowledge discovery by using information retrieval techniques. Two different sets of background knowledge that were obtained from the World Wide Web, one in 2006 and one in 2009, are used with the proposed approach to classify a commercial corpus of Web query data by the age of the user. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters, and exploring impact of the dynamic web on classification results.

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. In Chapter 10, Min Chen proposes an effective temporal-based event detection framework to support high-level video indexing and retrieval. The core of the framework is a temporal association mining process that systematically captures characteristic temporal patterns for identification of interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. Another characteristic of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can be input into the proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.

In Chapter 11, Ionescu et al. present an automatic content-based retrieval system for tackling the analysis and characterization of the artistic animated movies in. They deal with temporal segmentation, and propose cut, fade and dissolve detection methods adapted to the constraints of this domain. Furthermore, this chapter discusses a fuzzy linguistic approach to automatic symbolic/semantic content annotation in terms of color techniques and action content. The browsing issue is dealt by providing methods for static and dynamic video abstraction. For a quick browse of the movie’s visual content, Ionescu et al. create a storyboard-like summary, while for a "sneak peak" of the movie’s exciting action content proposed for a trailer-like video skim. 

Sports video analysis has been attracting more and more attention due to the potential commercial benefits, entertaining functionalities and mass audience requirements. Much research on shot classification, highlight extraction and event detection in sports video has been done to provide the general audience interactive video viewing systems for quick browsing, indexing and summarization. More keenly than ever, the audience desire professional insights into the games. The coach and the players demand automatic tactics analysis and performance evaluation with the aid of multimedia information retrieval technologies. It is also a growing trend to provide computer-assisted umpiring in sports games, such as the well-known Hawk eye system used in tennis. Therefore, sports video analysis is certainly a research issue worth investigation. In Chapter 12, Hua-Tsung Chen and Suh-Yin Lee review current research, give an insight into sports video analysis, and discuss potential applications and potential issues.

With the rapid advancement of music compression and storage technologies, digital music can be easily created, shared and distributed in computers and numerous portable digital devices. Music is often seen as a key component in many multimedia databases, and as they grow in size and complexity, their meaningful search and retrieval become important and necessary. Music information retrieval is a relatively young and challenging research area started since the late 1990s. Although some forms of music retrieval are available on the Internet, these tend to be inflexible and have significant limitations. In Chapter 13, Leung et al present an adaptive indexing approach to search and discover music information. High-level music semantics may be incorporated into search strategies through such an indexing architecture.

MP3 music has become very popular with the availability of powerful computation and wide bandwidth connectivity. Antonello D’Aguanno presents related techniques and algorithms to deal with compressed audio for content analysis. Chapter 14 focuses on a number of different algorithms dealing with common tasks of music information retrieval, such as tempo induction, tempo tracking, and automatic music synchronization. This chapter also presents an overview of the MusicXML, and IEEE1599 language to represent score and synchronization results and provides related applications, conclusions, and future works in the field of direct content analysis in compressed domain.

Most music is generally published in a cluster of songs, called an album, although many people enjoy individual songs, commonly called singles. In Chapter 15, Kristoffer Jensen investigates whether there is a reason for assembling full albums. Two different experiments are undertaken in order to investigate this issue. In the first experiment, automatic segmentation is done on full music albums. When the segmentation is done on song boundaries, different fade-ins and –outs are employed and songs are seen as the homogenous units. While the boundaries are found within songs, other homogenous units also exist. The second experiment on music sorting by similarity reveals the sorting complexity of music albums. If the sorting complexity is high, then the albums are unordered; otherwise the album is ordered with regards to the features.