An Automatic Video Reinforcing System for TV Programs using Semantic Metadata from Closed Captions

An Automatic Video Reinforcing System for TV Programs using Semantic Metadata from Closed Captions

Yuanyuan Wang (Yamaguchi University, Ube, Japan), Daisuke Kitayama (Kogakuin University, Tokyo, Japan), Yukiko Kawai (Kyoto Sangyo University, Kyoto, Japan), Kazutoshi Sumiya (Kwansei Gakuin University, Sanda, Japan) and Yoshiharu Ishikawa (Nagoya University, Nagoya, Japan)
DOI: 10.4018/IJMDEM.2016010101
OnDemand PDF Download:
$37.50

Abstract

There are various TV programs such as travel and educational programs. While watching TV programs, viewers often search related information about the programs through the Web. Nevertheless, as TV programs keep playing, viewers possibly miss some important scenes when searching the Web. As a result, their enjoyment would be spoiled. Another problem is that there are various topics in each scene of a video, and viewers usually have different levels of knowledge. Thus, it is important to detect topics in videos and supplement videos with related information automatically. In this paper, the authors propose a novel automatic video reinforcing system with two functions: (1) a media synchronization mechanism, which presents supplementary information synchronized with videos, in order to enable viewers to effectively understand the geographic data in videos; (2) a video reconstruction mechanism, which generates new video contents based on viewers' interests and knowledge by adding and removing scenes, in order to enable viewers to enjoy the generated videos without additional search.
Article Preview

Introduction

In recent years, there has been a rapid growth in TV channels all over the world, such as CBS (Columbia Broadcasting System) in the United States and NHK (Nippon Hoso Kyokai) in Japan. There are usually many kinds of TV programs (e.g., TV shows, news, and sports events), and the videos in TV programs are often associated with closed captions. While watching TV programs, viewers are probably interested in some contents in the videos and search related information through the Web. For example, viewers may search locations of tourist spots appeared on a travel channel, check the information of a player in a live sports program, or access an online store in a fashion program (Fleites, Wang, & Chen, 2015a, 2015b) using smartphones or tablets. However, since TV programs keep playing and cannot be paused, when searching the Web viewers possibly miss some important scenes and their enjoyment would be spoiled. Another problem stems from the various topics of each scene in a video and the different levels of knowledge of viewers. For example, some tourists may want a summary of delicious foods in Switzerland, while others may be more interested in the historical information about Switzerland. In other words, there can be different demands when viewers watch the same TV program about world heritage sites in Switzerland. On the other hand, they have to refer to other TV programs or resources for the wanted information such as Swiss foods or the history of Switzerland. In other words, it is difficult to meet multiple demands of viewers within only one video. Therefore, it is necessary to extract various topics from the video, which serve as viewers’ interests, and supplement the video automatically with related information (e.g., geographic contents, web contents) in each scene.

In this work, the goal is to develop a novel automatic video reinforcing system by analyzing semantic metadata (geographical metadata and topical metadata) from closed captions of videos in TV programs. The proposed system includes two functions: (1) a media synchronization mechanism and (2) a video reconstruction mechanism described as follows. This work is extended from the existing work (Wang, Kawai, Sumiya, & Ishikawa, 2015) which includes only the video reconstruction mechanism. The authors develop the media synchronization mechanism based on the concept of second screen service (Nandakumar & Murray, 2014; Geerts, Leenheer, Grooff, Negenman, & Heijstraten, 2014):

  • 1.

    A media synchronization mechanism: This mechanism presents additional geographic contents (e.g., map, Street View) synchronized with videos on large size screens or smaller sub-screens (smartphones) based on geographical relationships between every two location names appeared in each scene. To achieve it, the system extracts geographical metadata of each video, including a) temporal sequences of location names appeared in closed captions of the video; b) geographical relationships between the locations in each scene based on their geographical regions (see Figure 1). Then, the authors obtain semantic structure such as the intentions of the video. Finally, the authors determine how to present geographic contents seamlessly during the video;

  • 2.

    A video reconstruction mechanism: This mechanism integrates other related web contents (e.g., YouTube video clips, images) into a video and generates new video contents based on viewers’ interests and knowledge. Also, it removes unnecessary original scenes based on popularity rating of each original scene on related topics. To achieve it, the system extracts topical metadata of a video, including a) temporal sequences of topics appeared in closed captions of the video; b) popularity rating of each scene based on the number of search hits on related topics. Then, the authors determine other necessary contents and unnecessary original scenes. Finally, the authors determine how to generate four kinds of new video contents with level of detail (LOD) controlled under time pressure.

Figure 1.

Geographical metadata of a video in a TV program

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing