Copy Detection Using Graphical Model: HMM for Frame Fusion

Copy Detection Using Graphical Model: HMM for Frame Fusion

Shikui Wei, Yao Zhao, Zhenfeng Zhu
DOI: 10.4018/978-1-4666-1891-6.ch014
(Individual Chapters)
No Current Special Offers


With the growing popularity of video sharing websites and editing tools, it is easy for people to involve the video content from different sources into their own work, which raises the copyright problem. Content-based video copy detection attempts to track the usage of the copyright-protected video content by using video analysis techniques, which deals with not only whether a copy occurs in a query video stream but also where the copy is located and where the copy is originated from. While a lot of work has addressed the problem with good performance, less effort has been made to consider the copy detection problem in the case of a continuous query stream, for which precise temporal localization and some complex video transformations like frame insertion and video editing need to be handled. In this chapter, the authors attack the problem by employing the graphical model to facilitate the frame fusion based video copy detection approach. The key idea is to convert frame fusion problem into graph model decoding problem with the temporal consistency constraint and three relaxed constraints. This work employs the HMM model to perform frame fusion and propose a Viterbi-like algorithm to speedup frame fusion process.
Chapter Preview


By analyzing the video archives in large-scale network database, some researchers found that certain video content copied from the same source is frequently occurred in lots of different videos due to its popularity or importance, such as popular network video and important news shots. Generally, the usage of those popular video clips is not authorized by the original authors or organization, and tracking those clips is an important problem for digital copyright protection and law enforcement investigations. As an alternative to the watermarking technique, content-based video copy detection (CBCD) offers a quite different manner to media tracking and copyright protection. For watermarking techniques, they require some secret information to be embedded in the target video, and then perform copyright detection by retrieving the secret information. This means that some secret information must be embedded before the video archive is distributed. In practice, it may be difficult to fulfill the requirement since huge amounts of video data have earlier been distributed without such processing. In contrast, the CBCD does not pose any additional requirements (Kim, 2005), which directly detects copies by matching a query video with a reference database (Gengembre, 2008) (Joly, 2007) (Law-To, 2006).

Formally, CBCD refers to judging whether a query video contains any content originated from copyright protected video via some feature extraction and matching techniques (Yang, 2003). The key challenge in CBCD is how to precisely localize the pair of a copy and its original clip in both the query video stream and the reference database despite various video transformations on the copy. This challenge becomes more difficult and complicated as the size of reference database increases. To this end, a lot of work has been done in recent years. In the earlier work, the main effort focuses on frame feature extraction and video matching based on the aligned frames. For example, those reported in (Chen, 2008) (Hua, 2004) (Kim, 2005) (Lee, 2008) (Oostveen, 2002) treat a whole query video as a detection unit and attempt to match it with all possible subsequences of equal length within a long reference video, where a threshold is set to determine if there is a copy or not. However, those schemes fail if only a small segment in the query video is a copy in many practical applications Law-To, (2006). An example is a broadcast stream in which only some clips are potential copies. Therefore, more flexible detection methods need to be designed to address this issue. Recently, frame fusion based methods provide a possibility to detect copied segments (Douze, 2008) (Gengembre, 2008) (Hampapur, 2002) (Kim, 2008). These methods first search the reference database and return a list of similar reference frames for each query frame. Then the copies can be determined by fusing these returned reference frames according to a temporal consistency assumption. However, those methods generally process the query video in batch, that is, the query frames in one batch need to be parsed beforehand. This may limit or compromise the application or performance of those schemes for detecting copies in a continuous query video stream, such as the broadcast video stream.

To address the above problem, we consider a frame fusion based copy detection approach, which detects copies by similar frame search and frame fusion under a temporal consistency assumption. In this chapter, our work focuses mainly on the critical frame fusion stage that performs copy determination and temporal localization by employing a graphical model (here, HMM model). To improve fusion efficiency, the proposed scheme employs a Viterbi-like dynamic programming algorithm which comprises an on-line back-tracking strategy with three relaxed constraints, namely, emission constraint, transition constraint, and gap constraint. In particular, when a new query frame is read and a list of similar reference frames is retrieved for it, the emission constraint and transition constraint are then used to build transition relationship between reference frames in the current list and reference frames in previous lists. Finally the gap constraint is employed to determine the starting and ending positions of complete paths. Using the on-line back-tracking, we can get a few complete paths at current time instant, which correspond to the original video clips. Note that the starting and ending positions indicate the boundaries of potential copies in the query video stream.

To facilitate the following discussions, we clarify some terms used in this paper. A copy refers to a video clip originated from a copyright protected video. “Original video” and “reference video” are interchangeable throughout the paper, which mean the copyright protected video.

Complete Chapter List

Search this Book: