Unsupervised Video Object Foreground Segmentation and Co-Localization by Combining Motion Boundaries and Actual Frame Edges

Unsupervised Video Object Foreground Segmentation and Co-Localization by Combining Motion Boundaries and Actual Frame Edges

Chao Zhang, Guoping Qiu
DOI: 10.4018/IJMDEM.2018100102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this article the authors proposed a fast and fully unsupervised approach for a foreground object co-localization and segmentation of unconstrained videos. This article first computes both the actual edges and motion boundaries of the video frames, and then aligns them by the proposed HOG affinity map approach. Then, by filling the occlusions generated by the aligned edges, the paper obtained more precise masks about the foreground object. With an accumulation process, these masks could be derived as the motion-based likelihood, which is used as a unary term in the proposed graph model. Another unary term is called color-based likelihood, which is computed by the color distribution of foreground and background. Experiment results shows the method is fast and effective to detect and segment foreground objects.
Article Preview
Top

1. Introduction

The detection of video foreground objects is one of the key problems in video processing, which could directly facilitate the applications such as video object recognition (Prest, Leistner, Civera, Schmid, & Ferrari, 2012), action segmentation (Lu & Jason, 2015) and recognition (Wang & Schmid, 2013). In recent years, many successful video foreground segmentation approaches (Grundmann, Kwatra, Han, & Essa, 2010), (Jang, Lee, & Kim, 2016) and object detectors (Cho, Kwak, Schmid, & Ponce, 2015; Girshick, 2015), are proposed, which leveraged the understanding of high-level video content.

However, building a fully unsupervised model that to segment the foreground object of unconstraint videos is still challenging, as no additional information about the foreground is provided; and the videos may be influenced by the factors such as dynamic background, motion blurry, light condition changes, or even the editing factors (e.g. subtitles or flying logos) (Papazoglou & Ferrari, 2013). Therefore, the dependencies for segmentation become more significant, including the motion cues and appearance cues. The optical flow is a popular example of motion cues (Figure 1).

Figure 1.

An illustration of proposed Motion-based likelihood: (a) The original frame; (b) The actual edges of current frame (green) and the motion boundaries (red); (c) The HOG affinity map; where the motion boundaries agreed more with edge responses have higher scores; (d) The edges that aligned by motion; (e) Initial prediction of foreground by Inside-outside-map (Papazoglou & Ferrari, 2013); (f) The motion-based likelihood by accumulating the masks obtained by step (e)

IJMDEM.2018100102.f01

With adequate alignments, the motion boundary by optical flow shall be more effective since it often related to the actual edges of foreground objects (Li, Kim, Humayun, Tsai, & Rehg, 2013). On the other hand, unsupervised models such as the PHM (Cho, Kwak, Schmid, & Ponce, 2015) could be regarded as good appearance model, while the colour distribution seems to be a more simple but efficient option (Koh, Jang, & Kim, 2016). In fact, a more precise foreground could also significantly benefit the foreground prediction with colour model (Stretcu & Leordeanu, 2015).

In this paper, we address the problem of automatically locating and segmenting the foreground object for unconstrained videos and proposed a fully unsupervised approach based on both motion and appearance cues of object. The key contributions of our work are: (1) we proposed a fully unsupervised approach for video foreground object segmentation, with competitive performances on three datasets; (2) we obtained more precise motion-based foreground predictions with a novel HOG affinity map; and (3) we show that the shallow image processing algorithms are still capable for complex vision tasks such as video foreground segmentation. The experimental results show that our approach obtained competitive results on YouTube-Obj, J-HMDB and VOS dataset.

Top

Though the classic background subtraction methods performed well towards the task of foreground segmentation with stationary cameras or slow background motions; in unconstrained videos, the background would be more complex and harder to analyse (Koh, Jang, & Kim, 2016). In fact, to locate the foreground object in videos is a difficult task, and many efforts have been made in the past decade.

Given an unconstrained video, there will be no prior knowledge (such as colour or location) provided about the objects (Lee, Kim, & Grauman, 2011); thus, many approaches considered saliency-based measures to find the most obvious object in the frame, such as (Li, Zheng, Chen, & Zhou, 2017) and (Li, Xia, & Chen, 2018). Another popular approach is spatial-temporal saliency, of which the key concern is about the object saliency map (Qiu, Gu, Chen, Chen, & Wang, 2007; Guo, 2008; Liu, 2009).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing