Machine Learning for Image Classification

Machine Learning for Image Classification

Yu-Jin Zhang (Department of Electronic Engineering, Tsinghua University, China)
Copyright: © 2015 |Pages: 12
DOI: 10.4018/978-1-4666-5888-2.ch021

Chapter Preview



Currently, the typical framework adopted by the majority of existing image classification systems is discriminative model (Grauman, 2005; Lazebnik, 2006; van Gemert, 2010; Wang, 2010; Yang, 2009). Initially, bag of words model (also called codebook or codeword, i.e. dictionary), which treats an image as a collection of “Visual Words,” is the most commonly used method in image classification (Grauman, 2005). Although it achieves some satisfactory results, bag of words model has two drawbacks. One is that the spatial information for classification is lost because of unordered “Visual Words,” thus severely limiting the classification performance. The other is that each feature only corresponds to one word, so this hard decision will cause too large reconstruction error. For the former, the spatial pyramid matching method proposed has achieved remarkable success, and thus becomes an indispensable step for image classification (Lazebnik, 2006). For the latter, in order to solve the visual word ambiguity, various approaches have been suggested, such as kernel-codebook (van Gemert, 2010), locality-constrained linear coding which utilizes the linear combination of N-neighborhood bases to represent features (Wang, 2010). Furthermore, sparse coding based dictionary learning, which represents features by the sparse linear combination of several bases, is proposed (Yang, 2009), and achieved state-of-the-art performance.

However, the reconstruction error criterion takes effect mainly in measuring the mapping expression when transforming low-level descriptors into compact mid-level features. For the classification task, merely abasing the reconstruction error is far from enough. The optimum dictionary should have the ability to distinguish different classes. Hence, discriminative information is incorporated by minimizing the loss of mutual information between features and labels during the quantization step (Lazebnik, 2008). Besides, a training approach using discriminative dictionary based on sparse coding, with respect to the sparse codes rather than the pooling results is proposed (Mairal, 2008), so it requires each code to be labeled, and ignores global image statistics.

Key Terms in this Chapter

Sparse Coding: A class of unsupervised methods for learning sets of over-completed bases to represent data efficiently. Given a potentially large set of input patterns, sparse coding algorithms attempt to automatically find a small number of representative patterns which, when combined in the right proportions, reproduce the original input patterns. The sparse coding for the input then consists of those representative patterns.

Image Classification: Associating different images with some semantic labels to represent the image contents abstractly. To achieve this goal, various machine learning and pattern recognition techniques could be used.

Manifold: A topological space in mathematics. It is a central concept to many parts of geometry. In this space, near each point resembles Euclidean space. More precisely, each point of an n -D manifold has a neighborhood that is homeomorphic to the n -D Euclidean space, though globally a manifold might not.

Machine Learning (ML): A powerful tool for pattern classification. It uses the theory of statistics in building mathematical models, and programs computers to optimize a performance criterion using example data or past experience.

Dictionary Learning: Learning the basis set (dictionary) in sparse coding to modeling data vectors as sparse linear combinations of basis elements, to adapt it to specific data in image processing domain.

Locally Linear Embedding (LLE): An eigenvector method for solving the problem of nonlinear dimensionality reduction. The dimensionality reduction by LLE succeeds in identifying the underlying structure of the manifold.

Spatial Pyramid Matching (SPM): A simple and computationally efficient extension of pyramid matching that can capture perceptually salient features and suggest locally orderless matching, for including spatial information.

Complete Chapter List

Search this Book: