Recent (Dis)similarity Measures Between Histograms for Recognizing Many Classes of Plant Leaves: An Experimental Comparison

Recent (Dis)similarity Measures Between Histograms for Recognizing Many Classes of Plant Leaves: An Experimental Comparison

Mauricio Orozco-Alzate (Universidad Nacional de Colombia, Manizales, Colombia)
Copyright: © 2020 |Pages: 24
DOI: 10.4018/978-1-7998-1839-7.ch008

Abstract

The accurate identification of plant species is crucial in botanical taxonomy as well as in related fields such as ecology and biodiversity monitoring. In spite of the recent developments in DNA-based analyses for phylogeny and systematics, visual leaf recognition is still commonly applied for species identification in botany. Histograms, along with the well-known nearest neighbor rule, are often a simple but effective option for the representation and classification of leaf images. Such an option relies on the choice of a proper dissimilarity measure to compare histograms. Two state-of-the-art measures—called weighted distribution matching (WDM) and Poisson-binomial radius (PBR)—are compared here in terms of classification performance, computational cost, and non-metric/non-Euclidean behavior. They are also compared against other classical dissimilarity measures between histograms. Even though PBR gives the best performance at the highest cost, it is not significantly better than other classical measures. Non-Euclidean/non-metric nature seems to play an important role.
Chapter Preview
Top

Introduction

Biologists and ecologists have turned their attention to engineering, particularly to image processing and pattern recognition techniques, with the aim of developing visual recognition systems that give them support for the time-consuming and often ambiguous activities demanded by the species identification processes of individual organisms. Such identification processes are crucial in several tasks including —among others— refinement of taxonomic classification (Seeland et al., 2019), biodiversity estimation (Peng et al., 2018), detection of invaders (Pyšek et al., 2013), monitoring of endangered species (Omer et al., 2016) and assessment of ecosystem health (Li et al., 2014). Exemplar cases of the above-mentioned visual recognition systems are those developed by interdisciplinary endeavors of botanists and engineers for the automated classification of plant leaves.

According to Agarwal et al. (2006), the following reasons motivate the development of visual recognition systems for the leaf-based identification of plant species: (i) providing field botanists with a tool for rapid in situ comparison of their collected specimens against prototype ones from reference collections stored in herbaria; (ii) accelerating the process for identifying potentially novel species because the destruction of habitats is also occurring at dramatic speeds and (iii) easily accessing to descriptive information and meta-data of the species. Many different techniques have been used for leaf recognition systems; see for instance the ones reviewed by Cope et al. (2012) and Wäldchen et al. (2018) as well as other studies not found there such as Larese et al., 2014, Fan et al., 2015, Grinblat et al., 2016, Wäldchen & Mäder, 2018, Nguyen Thanh et al., 2018 and Lorieul et al., 2019.

Leaf recognition can be roughly decomposed into the four stages of a conventional pattern recognition system, namely acquisition, preprocessing, representation and classification. The first stage typically corresponds to a scanner or a camera that provides high quality photographs; the second one is aimed at adequating the images —via operations as filtering, segmentation and binarization— such that the subsequent stage becomes easier; the third one consists in extracting a set of features (characteristic measurements stored as a vector) from the preprocessed images that are known to be discriminative; examples of these features include geometric descriptors such as shape, contour, textures, color and histograms. The last stage is fed with a collection of examples such that a classification algorithm, either based on probabilities or (dis)similarities, assigns a class label (the name of the species) to the image of the examined leaf.

In the last stage —classification— almost all the available classifiers have been used for leaf recognition, from simple ones such as the nearest neighbor rule, decision trees and Bayesian classifiers (Rahmani et al., 2016) to complex ones such as support vector machines and, more recently, convolutional neural networks (Barré et al., 2017; Nguyen Thanh et al., 2018; Lorieul et al., 2019). However, in spite of the current availability of many sophisticated methods, the nearest neighbor classifier is still a competitive tool in terms of accuracy and it is very intuitive, allowing thereby an easy understanding of the motivation of the class label assignment (Duin et al., 2014). Moreover, the performance of the nearest neighbor classifier can be significantly boosted by choosing a proper dissimilarity measure according to the nature of the representation that is derived from the raw input images.

Key Terms in this Chapter

Dissimilarity Measure: A measure to judge nearness or closeness between either the objects themselves or their representations. It should, at least, fulfill the reflexivity condition.

Non-Euclideaness: Deviation from the Euclidean behavior. In practice, such a deviation occurs when triplets of dissimilarities do not obey the relation established by the Pythagorean theorem. For a dissimilarity matrix, the degree of non-Euclideaness can be estimated by the so-called negative eigenfraction.

Histogram: Count-based model of a probability distribution over a discrete number of bins.

Plant Leaf Recognition: The task of identifying plant species based on discriminant properties of their leaves.

Computational Cost: The amount of time required to complete certain operation. Even though computational cost has to do also with several computer resources (memory, power supply, etc.), it typically refers to computation time.

Non-Metricity: Violation of the triangle inequality by a triplet of dissimilarities. For a dissimilarity matrix, the degree of non-metricity can be estimated by the so-called non-metric fraction.

Leave-One-Out Classification Error: Estimation of the performance by training the classifier N times, in such a way that each object is used one time for testing and the remaining N – 1 objects for training.

Complete Chapter List

Search this Book:
Reset