Article Preview
TopIntroduction
In recent years, there has been a rapid growth in TV channels all over the world, such as CBS (Columbia Broadcasting System) in the United States and NHK (Nippon Hoso Kyokai) in Japan. There are usually many kinds of TV programs (e.g., TV shows, news, and sports events), and the videos in TV programs are often associated with closed captions. While watching TV programs, viewers are probably interested in some contents in the videos and search related information through the Web. For example, viewers may search locations of tourist spots appeared on a travel channel, check the information of a player in a live sports program, or access an online store in a fashion program (Fleites, Wang, & Chen, 2015a, 2015b) using smartphones or tablets. However, since TV programs keep playing and cannot be paused, when searching the Web viewers possibly miss some important scenes and their enjoyment would be spoiled. Another problem stems from the various topics of each scene in a video and the different levels of knowledge of viewers. For example, some tourists may want a summary of delicious foods in Switzerland, while others may be more interested in the historical information about Switzerland. In other words, there can be different demands when viewers watch the same TV program about world heritage sites in Switzerland. On the other hand, they have to refer to other TV programs or resources for the wanted information such as Swiss foods or the history of Switzerland. In other words, it is difficult to meet multiple demands of viewers within only one video. Therefore, it is necessary to extract various topics from the video, which serve as viewers’ interests, and supplement the video automatically with related information (e.g., geographic contents, web contents) in each scene.
In this work, the goal is to develop a novel automatic video reinforcing system by analyzing semantic metadata (geographical metadata and topical metadata) from closed captions of videos in TV programs. The proposed system includes two functions: (1) a media synchronization mechanism and (2) a video reconstruction mechanism described as follows. This work is extended from the existing work (Wang, Kawai, Sumiya, & Ishikawa, 2015) which includes only the video reconstruction mechanism. The authors develop the media synchronization mechanism based on the concept of second screen service (Nandakumar & Murray, 2014; Geerts, Leenheer, Grooff, Negenman, & Heijstraten, 2014):
- 1.
A media synchronization mechanism: This mechanism presents additional geographic contents (e.g., map, Street View) synchronized with videos on large size screens or smaller sub-screens (smartphones) based on geographical relationships between every two location names appeared in each scene. To achieve it, the system extracts geographical metadata of each video, including a) temporal sequences of location names appeared in closed captions of the video; b) geographical relationships between the locations in each scene based on their geographical regions (see Figure 1). Then, the authors obtain semantic structure such as the intentions of the video. Finally, the authors determine how to present geographic contents seamlessly during the video;
- 2.
A video reconstruction mechanism: This mechanism integrates other related web contents (e.g., YouTube video clips, images) into a video and generates new video contents based on viewers’ interests and knowledge. Also, it removes unnecessary original scenes based on popularity rating of each original scene on related topics. To achieve it, the system extracts topical metadata of a video, including a) temporal sequences of topics appeared in closed captions of the video; b) popularity rating of each scene based on the number of search hits on related topics. Then, the authors determine other necessary contents and unnecessary original scenes. Finally, the authors determine how to generate four kinds of new video contents with level of detail (LOD) controlled under time pressure.
Figure 1. Geographical metadata of a video in a TV program