Article Preview
Top1. Introduction
Most existing image classification algorithms treat categories as completely independent both visually and semantically. However, humans are believed to use semantic relations to classify categories (Collin, 2005). For example, it is unreasonable to distinguish “truck” from “vehicle” since “truck” is a kind of “vehicle”. In addition, it is common for humans to use different features to discriminate different objects. For example, “wheel” is a useful feature to distinguish “car” from “animal” while shape differences are more discriminative to distinguish “truck” from “sedan”.
Although having good performance on some easy image classification datasets such as Caltech 101 (Fei-Fei, 2007) and Caltech 256 (Griffin, 2007), the neglect of semantics makes most existing image classification algorithms (Shao, 2014; Wang, 2010; Zhang, 2014) not only have limited results on challenging problems such as fine-grained image classification (Deng, 2009; Welinder, 2010), but also are at odds with the human visual system.
An ontology is a hierarchical structure consisting of categories and high-level relations such as “is-a” and “part-of”. It encodes semantics in a hierarchical way that is very similar to human perception. Therefore it provides a useful tool to incorporate semantics into frameworks of traditional image classification approaches. However, traditional ontology based algorithms (Marszalek, 2007; Tsai, 2010; Xu, 2014) build ontological classifiers which have a classifier at every ontological node to discriminate the node's sub-categories. This simple framework leads to error propagation such that if an image is misclassified at any intermediate node along the path from the root concept to the leave concept, the prediction will be wrong. This issue is serious due to large intra-class variations of super-categories, i.e., it is difficult to train a good classifier for general concepts such as “animal” and “vehicle”. As a result, previous use of ontologies on image classification mainly aims at improving classification speed instead of classification accuracy.
In comparison to the fixed structures of ontologies, decision tree has the advantage of flexible structure. Previous decision-tree based approaches can be categorized into two directions by different splitting methods. Approaches of the first direction (Belgiu, 2014; Yao, 2011) use random splits that randomly partition categories into a binary set at each tree node. The second direction (Fan, 2014; Liu, 2013) is based on visual splits that at each tree node categories with similar visual appearances are grouped together. However, random splits do not leverage any prior knowledge of data and thus the discriminative power is weak. On the other hand, the cost of visual splits is too high when the number of images and categories grows.