Discriminative Feature Selection in Image Classification and Retrieval

Discriminative Feature Selection in Image Classification and Retrieval

Shang Liu (Beihang University, China) and Xiao Bai (Beihang University, China)
DOI: 10.4018/978-1-4666-1891-6.ch011
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, the authors present a new method to improve the performance of current bag-of-words based image classification process. After feature extraction, they introduce a pairwise image matching scheme to select the discriminative features. Only the label information from the training-sets is used to update the feature weights via an iterative matching processing. The selected features correspond to the foreground content of the images, and thus highlight the high level category knowledge of images. Visual words are constructed on these selected features. This novel method could be used as a refinement step for current image classification and retrieval process. The authors prove the efficiency of their method in three tasks: supervised image classification, semi-supervised image classification, and image retrieval.
Chapter Preview
Top

Introduction

Image classification and retrieval are important research topics in the areas of computer vision, pattern recognition and machine learning. The earlier attempts can handle images with simple background. However, in modern days, the images on the website or computers normally contain complex background and various depictions. Recent research has made a lot of effort to tackle those complex images. One major attempt is to extract local invariant features from images. Famous contributions include SIFT (Lowe, 2004), PCA-SIFT (Ke, 2004), SURF (Bay, 2008) and more recently, Local Self-Similarity (LSS) (Shechtman, 2007). These descriptors can be used together with Scale-invariant regions (Mikolajczyket, 2005) to extract invariants features within scale-invariant regions. Other attempts mainly focus on building structural or statistical learning framework. Structure-based methods (Bai, 2009) (Bunke, 1998)(Xia, 2009) aim at extracting structure invariant to characterize the objects contained in the images. Statistical learning (Fergus, 2003) (Weber, 2000) (Chum, 2007) on the other hand, tries to characterize the low-level invariant via complex statistical models, which comprised of the positions and the scalar values of the local invariant feature descriptors.

One typical example is bag-of-words approach (Li, 2005), which originates from the area of document analysis. Here in computer vision, the basic bag-of-words approach can be described as following: an image is treated as a special document. Features extracted from the image are considered “visual words.” Images can then be analyzed by counting the frequency of the meaningful visual words and represented by a visual words frequency histogram. Traditional pattern analysis methods such as Support Vector Machine (SVM) (Duda, 2002), Gaussian Mixture Model (GMM) (Figueiredo, 2002) or Linear Discriminative Analysis (Duda, 2002) can then be used for recognition and classification. Although simple and effective, one major problem for bag-of-words approach is that the constructed visual words are general but not discriminative. The reason relies on that any feature extraction method will extract not only the foreground but also background features within an image. An example is given in Figure 1. We can observe that the SIFT features are overlaid on both foreground and background parts of the image. When all the extracted features are used to construct the visual words, many visual words may correspond to the background part of the image. Even for the same category images, the background information varies significantly. The features we expect should be discriminative enough to carry category specific information. However, the background features contain little discriminative information. This influence the visual words construction and classification steps. The classification and recognition process is degraded both in accuracy and efficiency. The existing statistical and structural based methods suffer the same problem. If discriminative features can be extracted, then the high level category information can be better characterized. The classification or retrieval performance can be improved. In this chapter, we investigate whether discriminative features can be extracted to highlight the category information while diminish the background information.

Figure 1.

Feature extraction example - both foreground and background features are extracted at the same time. The left is the original image, which is a car-tire. The right is processed by SIFT extraction, and the dots represent SIFT features.

Complete Chapter List

Search this Book:
Reset