Movie Video Summarization- Generating Personalized Summaries Using Spatiotemporal Salient Region Detection

Movie Video Summarization- Generating Personalized Summaries Using Spatiotemporal Salient Region Detection

Rajkumar Kannan (Bishop Heber College, Tiruchirappalli, India), Sridhar Swaminathan (Bennett University, Greater Noida, India), Gheorghita Ghinea (School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, UK & Norwegian School of Information Technology, Oslo, Norway), Frederic Andres (National Institute of Informatics, Chiyoda City, Japan) and Kalaiarasi Sonai Muthu Anbananthen (Multimedia University Malacca, Bukit Beruang, Malaysia)
DOI: 10.4018/IJMDEM.2019070101
OnDemand PDF Download:
List Price: $37.50
10% Discount:-$3.75


Video summarization condenses a video by extracting its informative and interesting segments. In this article, a novel video summarization approach is proposed based on spatiotemporal salient region detection. The proposed approach first segments a video into a set of shots which are ranked with spatiotemporal saliency scores. The score for a shot is computed by aggregating the frame level spatiotemporal saliency scores. This approach detects spatial and temporal salient regions separately using different saliency theories related to objects present in a visual scenario. The spatial saliency of a video frame is computed using color contrast and color distribution estimations and center prior integration. The temporal saliency of a video frame is estimated as an integration of local and global temporal saliencies computed using patch level optical flow abstractions. Finally, top ranked shots with the highest saliency scores are selected for generating the video summary. The objective and subjective experimental results demonstrate the efficacy of the proposed approach.
Article Preview

1. Introduction

In this digital era, rapid growth of videos at exponential rate necessitates development of digital assistive technologies in accessing the voluminous video content (Money, & Agius, 2008). Video summarization aims at producing a compact version of a full-length video while preserving the significant content of the original video (Kannan et al., 2015). Video summarization assists the users in understanding the overall content of a video quickly and helps them decide whether to watch the entire video or not.

Generally, video summaries are visualized using either keyframes or video skims (Ejaz et al., 2014). The keyframes generated by a video summarization approach comprise a collection of video frames that represents the diverse and important segments in a video. Video skim is a compilation of significant video segments and has comparatively smaller length than the original video. Due to its high expressivity, video skims are often adopted for video summary visualization.

According to the types of content used for video analysis, recent video summarization methods can be classified into cognitive-level approaches and affective-level approaches (Tsai et al., 2013). The cognitive-level video summarization approaches (Hua et al., 2005; Rapantzikos et al., 2007; Evangelopoulos et al., 2013) estimate cognitive cues such as audio, visual and textual saliencies to identify highly important segments of a video. Saliency based video summarization approaches utilize users’ task-independent response or attention to important and unique parts of a video. The affective-level summarization approaches (Joho et al., 2009; Katti et al., 2011; Peng et al., 2011) on the other hand, generate video summaries by modeling the affective video content by exploiting users’ feedbacks/responses while watching a video. This class of approaches need controlled summarization setups where the summarization efficiency depends on the ability to capture users’ responses and mapping of those responses to the video segments.

In the recent years, cognitive-level approaches are highly preferred for video summarization due to their comparatively high efficiency and effectiveness where they are generic in nature and can be applied to videos of any domain. Among different classes of cognitive-level video summarization approaches, visual saliency detection is considered one of the most successful and widely used mechanisms for both keyframe extraction and video skim generation in cognitive-level video summarization.

Recent research in Visual Attention Modelling has enabled advanced saliency theories to be exploited for video summarization. In the recent years, several computational models of visual saliency were proposed for video summarization (Ejaz et al., 2014; Ma, & Zhang, 2002; Marat et al., 2007; Longfei et al., 2008; Peng, & Xiao-Lin, 2009; Huang et al., 2011; Wang et al., 2011; Lai, & Yi, 2012; Salehin, & Paul, 2015; Ejaz et al., 2013). These approaches aim at utilizing users’ instinctive, stimulus-driven attention towards salient regions in a visual scene for summarizing a video. Visual saliency in a video is computed as the spatiotemporal saliency of the individual video frames. Visual saliency or spatiotemporal saliency detection is considered primary step in attention based cognitive level video summarization approaches for detecting significant and interesting video segments. Spatiotemporal saliency in a video frame is estimated by computing the saliencies in spatial and temporal domains separately and then integrating them with a fusion scheme (Borji, & Itti, 2012).

Complete Article List

Search this Journal:
Volume 14: 1 Issue (2023): Forthcoming, Available for Pre-Order
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing