Recent Progress in Image and Video Segmentation for CBVIR

Recent Progress in Image and Video Segmentation for CBVIR

Yu-Jin Zhang (Tsinghua University, Beijing, China)
DOI: 10.4018/978-1-60566-026-4.ch515


A simple search from EI Compendex by using the term “image segmentation” only in title field could produce around 5000 records (Zhang, 2006). However, as no general theory for image segmentation for different application domains, particular algorithms have been developed. The domain of Content-Based Image Retrieval (CBIR) is such a typical example, where many specific techniques have been proposed. An introduction focused on research works before 2004 can be found in Zhang (2005). This paper is an up-to-date and extended version from CBIR to CBVIR (Content-Based Visual Information Retrieval) by including CBVR (Content- Based Video Retrieval), which focused on the progress in last 3 years, and especially on video segmentation.
Chapter Preview


A Formal Definition of Image Segmentation

Image segmentation is the first step and also one of the most critical tasks of image analysis. It is often described as the process that subdivides an image into its constituent parts and extracts those parts of interest (objects).

A formal definition of image segmentation, supposing the whole image is represented by f(x, y), and fi(x, y),i = 1, 2, …, n are disjoint non-empty regions of f(x, y), consists of the following conditions (Fu, 1981):

  • (1)


  • (2)

    For all i and j, ij, there exits;

  • (3)

    For i = 1, 2, …, n, it must have;

  • (4)

    For all ij, there exits;

where P[fi(x, y)] is a uniformity predicate for all elements in fi(x, y) and ∅ represents an empty set.

Considering the real situation in practice, the following condition can be added:

  • (5)

    For all i = 1, 2, …, n, fi(x, y) is a connected component.

In the above conditions, condition (1) points out that the summation of segmented regions could include all pixels in an image; condition (2) points out that different segmented regions could not overlap each other; condition (3) points out that the pixels in the same segmented regions should have some similar properties; condition (4) points out that the pixel belonging to different segmented regions should have some different properties; and, finally, condition (5) points out that the pixels in the same segmented region are connected.

Key Terms in this Chapter

Graph Cuts: A widely used technique in energy minimization and clustering. When there are only two labels, the energy minimization problem can be reduced directly to a problem of computing the max-flow (or min-cut) of a graph.

Gaussian Mixture Model: A traditional parameter-based clustering model. All samples are assumed to belong to one of several Gaussian distributions, the ensemble can be described by the sum of several Gaussian distribution.

Watershed: A transform or an algorithm for image segmentation. It is traditionally classified as a region-based segmentation approach. The idea underlying watershed transform comes from geography.

Highest Confidence First: An efficient technique for energy minimization. It is a deterministic minimization algorithm, which is suitable for assigning label to pixels with unknown label, because it introduces an uncommitted label.

AR Model Order: It characterizes how close the relation is for the frame subsequence in the visual content.

K-mean Model: A traditional parameter-based clustering model. K initial classes are represented by their mean values, the following iterations try to minimize the distance between the samples with their respect means.

Feature Space Analysis: Image analysis performed in the feature space, in which each point corresponds to a feature value extracted from image.

Clustering: A process to group, based on some defined criteria, two or more terms together to form a large collection. In the context of image segmentation, clustering is to gather several pixels or groups of pixels with similar property to form a region.

Complete Chapter List

Search this Book: