Video Object Segmentation

Ee Ping Ong (Institute for Infocomm Research, Singapore) and Weisi Lin (Nanyang Technological University, Singapore)
Copyright: © 2009 |Pages: 8
DOI: 10.4018/978-1-59904-845-1.ch106
Video object segmentation aims to extract different video objects from a video (i.e., a sequence of consecutive images). It has attracted vast interests and substantial research effort for the past decade because it is a prerequisite for visual content retrieval (e.g., MPEG-7 related schemes), object-based compression and coding (e.g., MPEG-4 codecs), object recognition, object tracking, security video surveillance, traffic monitoring for law enforcement, and many other applications. Video object segmentation is a nonstandardized but indispensable component for an MPEG4/7 scheme in order to successfully develop a complete solution. In fact, in order to utilize MPEG-4 object-based video coding, video object segmentation must first be carried out to extract the required video object masks. Video object segmentation is an even more important issue in military applications such as real-time remote missile/vehicle/soldier’s identification and tracking. Other possible applications include home/office/warehouse security where monitoring and recording of intruders/foreign objects, alarming the personnel concerned or/and transmitting the segmented foreground objects via a bandwidth-hungry channel during the appearance of intruders are of particular interest. Thus, it can be seen that fully automatic video object segmentation tool is a very useful tool that has very wide practical applications in our everyday life where it can contribute to improved efficiency, time, manpower, and cost savings.
For segmentation of objects from video sequences, temporal and spatial information and their appropriate combination have been extensively exploited (Aach & Kaup, 1993; Bors & Pitas, 1998; Castagno, Ebrahimi, & Kunt, 1998; Chen, Chen, & Liao, 2000; Chen & Swain, 1999; Chien, Ma, & Chen, 2002; Cucchiara, Onfiani, Prati, & Scarabottolo, 1999; Kim, Choi, Kim, Lee, Lee, Ahn, & Ho, 1999; Kim, Jeon, Kwak, Lee, & Ahn, 2001; Koller, Weber, Huang, Malik, Ogasawara, Rao, & Russel, 1994; Li et al., 2001; Li, Tye, Ong, Lin, & Ko, 2002; Li, Gu, Leung, & Tian, 2004; Liu, Hong, Herman, & Chellappa, 1998; Liu, Chang, & Chang, 1998; Mech & Wollborn, 1997; Meier & Ngan, 1999; Mester & Aach, 1997; Neri, Colonnese, Russo, & Talone, 1998; Odobez & Bouthemy, 1998; Ong & Spann, 1999; Shao, Lin, & Ko, 1998a, 1998b; Toklu, Tekalp, & Erdem, 2000). Fully automatic extraction of semantically meaningful objects in video is extremely useful in many practical applications but faces problems like limited domain of application, ad hoc approaches, need of excessive parameter/threshold setting and fine-tuning, and overly complicated algorithms. With current level of development of algorithms, in general, only supervised segmentation approaches (e.g., Castagno et al., 1998; Toklu et al., 2000) are capable of detecting semantic objects more accurately from video. Supervised approaches can be found in applications such as studio editing and content retrieval.

