Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Multimodal Information Fusion for Semantic Video Analysis

Elvan Gulen, Turgay Yilmaz, Adnan Yazici

Source Title: International Journal of Multimedia Data Engineering and Management (IJMDEM) 3(4)

DOI: 10.4018/jmdem.2012100103

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Multimedia data by its very nature contains multimodal information in it. For a successful analysis of multimedia content, all available multimodal information should be utilized. Additionally, since concepts can contain valuable cues about other concepts, concept interaction is a crucial source of multimedia information and helps to increase the fusion performance. The aim of this study is to show that integrating existing modalities along with the concept interactions can yield a better performance in detecting semantic concepts. Therefore, in this paper, the authors present a multimodal fusion approach that integrates semantic information obtained from various modalities along with additional semantic cues. The experiments conducted on TRECVID 2007 and CCV Database datasets validates the superiority of such combination over best single modality and alternative modality combinations. The results show that the proposed fusion approach provides 16.7% relative performance gain on TRECVID dataset and 47.7% relative performance improvement on CCV database over the results of best unimodal approaches.

Article Preview

Top

Introduction

In recent years, the developments in multimedia technologies led to a rapid grow in the usage of multimedia data, especially videos. This demand in the usage of digital video data has exposed the need for content based video retrieval systems. Content based retrieval of videos requires extracting the semantic content in videos. Nevertheless, the ‘semantic gap’ between the low-level features of multimedia data and the high-level semantic information is still a challenging problem. Thus, semantic content extraction is still a compelling research topic for the researchers (Datta, Li, & Wang, 2005).

Videos, by its very own nature, comprise different types of data such as text, audio, and image in itself. Correspondingly, the semantic information to be extracted is directly connected to these separate sources. Therefore, in order to provide an efficient semantic content extraction solution, this nature of the multimedia data should be analyzed carefully and contained information should be used thoroughly. Video data displays an unstructured characteristic and leads to several complexities such as lighting variations, camera motion, occlusion, viewpoints changes, noise in the sensed data, etc. Moreover, video data has another important characteristic, which can help to overcome most of these challenges; the multimodal content. Integrating the information obtained from multiple modalities is an empirically validated approach to increase the retrieval accuracy (Atrey, Hossain, El-Saddik, & Kankanhalli, 2010). Besides, the dependence on any modality can be minimized with integration and this yields to a more robust system. We can think of a people-marching event, as an example, where the event can be recognized by using in any of the visual, audio and textual modalities. The video can include people as visual objects, a shouting sound and also some lyrics of a march in the closed caption text. A combination of these modalities can provide higher detection accuracy for people-marching event and is less dependent on potential problems in any of the modalities.

The information fusion literature contains a significant number of studies on multimodal information fusion. However, most of these studies do not take advantage of all available modalities. Instead, they focus on some alternative modality couples, especially ‘audio-visual’ and ‘visual-textual’ (Maragos, Gros, Katsamanis, & Papandreou, 2008)(Maragos, Gros, Katsamanis, & Papandreou, 2008; Atrey, Hossain, El-Saddik, & Kankanhalli, 2010) (Atrey, Hossain, El-Saddik, & Kankanhalli, 2010). In this study, we aim to incorporate as much information as possible through the existing modalities for the purpose of semantic concept detection. Hereby, we consider that a ‘modality’ is a set of information which is complementary to the other included modalities (Wu, Chang, Chang, & Smith, 2004) and elaborate the three information channels in video (visual, audio, textual) into following complementary modalities: Visual-Color, Visual-Region, Visual-Texture, Audio-Perceptual, Audio-Cepstral and Textual. Thus, we try to benefit from any useful information included in the video data and increase the retrieval accuracy of the concepts. The concepts intended to be predicted are referred as semantic concepts which constitute a class of elements that together share essential characteristics which identify the class. These semantic concepts include visual objects like Car, Bird, etc. or events like Biking or other semantic concepts like Soccer, Basketball, etc.

Complete Article List

Search this Journal:

Reset

Volume 15: 1 Issue (2024)

Volume 14: 1 Issue (2023)

Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming

Volume 12: 4 Issues (2021)

Volume 11: 4 Issues (2020)

Volume 10: 4 Issues (2019)

Volume 9: 4 Issues (2018)

Volume 8: 4 Issues (2017)

Volume 7: 4 Issues (2016)

Volume 6: 4 Issues (2015)

Volume 5: 4 Issues (2014)

Volume 4: 4 Issues (2013)

Volume 3: 4 Issues (2012)

Volume 2: 4 Issues (2011)

Volume 1: 4 Issues (2010)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Multimodal Information Fusion for Semantic Video Analysis

Abstract

Introduction

Complete Article List