Fast Video Shot Boundary Detection Technique based on Stochastic Model

Fast Video Shot Boundary Detection Technique based on Stochastic Model

Mohammad A. Al-Jarrah (Department of Computer Engineering, Yarmouk University, Irbid, Jordan) and Faruq A. Al-Omari (Yarmouk University, Irbid, Jordan)
Copyright: © 2016 |Pages: 17
DOI: 10.4018/IJCVIP.2016070101
OnDemand PDF Download:
No Current Special Offers


A video is composed of set of shots, where shot is defined as a sequence of consecutive frames captured by one camera without interruption. In video shot transition could be a prompt (hard cut) or gradual (fade, dissolve, and wipe). Shot boundary detection is an essential component of video processing. These boundaries are utilized on many aspect of video processing such as video indexing, and video in demand. In this paper, the authors proposed a new shot boundary detection algorithm. The proposed algorithm detects all type of shot boundaries in a high accuracy. The algorithm is developed based on a global stochastic model for video stream. The proposed stochastic model utilizes the joined characteristic function and consequently the joined momentum to model the video stream. The proposed algorithm is implemented and tested against different types of categorized videos. The proposed algorithm detects cuts fades, dissolves, and wipes transitions. Experimental results show that the algorithm has high performance. The computed precision and recall rates validated its performance.
Article Preview

1. Introduction

The numerous availability and widespread use of digital video data has emerged due to the recent advances made in multimedia technology coupled with the significance increase in computer systems performance and the growth of Internet. Several applications incorporate and inherit usage of digital video libraries including but not limited to distance learning systems, video-on-demand, and interactive TV (SenGupta, Thounaojam, Manglem, & Roy, 2015; Chen & Zhang, 2008; Money & Agius, 2008; Urhan, Güllü, & Ertürk, 2006; Ren, Jiang, & Chen, 2009). This, in turn, entails the need for efficient and reliable tools to manage video databases for proper browsing, indexing, and retrieving relevant material (Petersohn, 2010; Xu, et al., 2014; Vila, Bardera, Xu, Feixas, & Sbert, 2013; Ren et al., 2009).

A fundamental process in automatic annotation of digital video sequences is temporal video segmentation (Chasanis, Likas, & Galatsanos, 2009; Couprie, Farabet, LeCun, & Najman, 2013; Mukherjee, S., & Mukherjee, D., 2013; Cooper, Liu, & Rieffel, 2007). Thereby, a video sequence is divided into a set of meaningful and manageable segments, called shots, as shown in Figure 1. A video shot is defined as an unbroken sequence of frames captured by one camera during a “record” and “stop” operation (Chasanis, Likas, & Galatsanos, 2009; Petersohn, 2010; Lefèvre, & Vincent, 2007; Zhang, Lin, Chen, Huang, & Liu, 2006). Intuitively, transition between video shots in a video sequence occurs in two basic forms, namely cut, and gradual transitions. Cut transitions are defined as an abrupt change in the camera scene that occurs in a single frame through stopping and restarting the camera, whereas, gradual transitions are artificially introduced to combine two shots in the lifetime of several frames (Gao, & Ma, 2014; Tavassolipour, Karimian, & Kasaei, 2014; Couprie et al., 2013; Petersohn, 2010; Zhang et al., 2006; Ren et al., 2009). Fades and dissolves are the most commonly used cinematic effects to produce gradual transitions. Fade out is a slow decrease in intensity leading to a black frame or a dominant color. In contrast, fade in is a slow increase in intensity starting from a black frame. Dissolves, on the other hand, is the process of superimposing the first frame of the new shot on the last frame of the previous shot so that the previous frame gets dimmer and the new frame gets stronger (Petersohn, 2010; Yuan et al., 2007).

Figure 1.

Video structure


Video abstraction is the process of summarizing a video sequence by a set of key frames that represent the set of detected video shots. For a manageable video database, only key frames are indexed and hence video data retrieval systems process queries based on a similarity measure between the query input and the key frames data. Several techniques have been developed in recent years to summarize video sequences (Chen et al., 2008; Jiang, Sun, Liu, Chao, & Zhang, 2013; Chen, Ren, & Jiang, 2011; Jiang, Sun, Liu, Chao, & Zhang, 2013). These techniques vary in their performance and complexity.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2020)
Volume 9: 4 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 2 Issues (2016)
Volume 5: 2 Issues (2015)
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2013)
Volume 2: 4 Issues (2012)
Volume 1: 4 Issues (2011)
View Complete Journal Contents Listing