Article Preview
Top1. Introduction
The numerous availability and widespread use of digital video data has emerged due to the recent advances made in multimedia technology coupled with the significance increase in computer systems performance and the growth of Internet. Several applications incorporate and inherit usage of digital video libraries including but not limited to distance learning systems, video-on-demand, and interactive TV (SenGupta, Thounaojam, Manglem, & Roy, 2015; Chen & Zhang, 2008; Money & Agius, 2008; Urhan, Güllü, & Ertürk, 2006; Ren, Jiang, & Chen, 2009). This, in turn, entails the need for efficient and reliable tools to manage video databases for proper browsing, indexing, and retrieving relevant material (Petersohn, 2010; Xu, et al., 2014; Vila, Bardera, Xu, Feixas, & Sbert, 2013; Ren et al., 2009).
A fundamental process in automatic annotation of digital video sequences is temporal video segmentation (Chasanis, Likas, & Galatsanos, 2009; Couprie, Farabet, LeCun, & Najman, 2013; Mukherjee, S., & Mukherjee, D., 2013; Cooper, Liu, & Rieffel, 2007). Thereby, a video sequence is divided into a set of meaningful and manageable segments, called shots, as shown in Figure 1. A video shot is defined as an unbroken sequence of frames captured by one camera during a “record” and “stop” operation (Chasanis, Likas, & Galatsanos, 2009; Petersohn, 2010; Lefèvre, & Vincent, 2007; Zhang, Lin, Chen, Huang, & Liu, 2006). Intuitively, transition between video shots in a video sequence occurs in two basic forms, namely cut, and gradual transitions. Cut transitions are defined as an abrupt change in the camera scene that occurs in a single frame through stopping and restarting the camera, whereas, gradual transitions are artificially introduced to combine two shots in the lifetime of several frames (Gao, & Ma, 2014; Tavassolipour, Karimian, & Kasaei, 2014; Couprie et al., 2013; Petersohn, 2010; Zhang et al., 2006; Ren et al., 2009). Fades and dissolves are the most commonly used cinematic effects to produce gradual transitions. Fade out is a slow decrease in intensity leading to a black frame or a dominant color. In contrast, fade in is a slow increase in intensity starting from a black frame. Dissolves, on the other hand, is the process of superimposing the first frame of the new shot on the last frame of the previous shot so that the previous frame gets dimmer and the new frame gets stronger (Petersohn, 2010; Yuan et al., 2007).
Video abstraction is the process of summarizing a video sequence by a set of key frames that represent the set of detected video shots. For a manageable video database, only key frames are indexed and hence video data retrieval systems process queries based on a similarity measure between the query input and the key frames data. Several techniques have been developed in recent years to summarize video sequences (Chen et al., 2008; Jiang, Sun, Liu, Chao, & Zhang, 2013; Chen, Ren, & Jiang, 2011; Jiang, Sun, Liu, Chao, & Zhang, 2013). These techniques vary in their performance and complexity.