Image segmentation consists of subdividing an image into its constituent parts and extracting those parts of interest (objects). Due to its importance in image analysis, many research works have been conducted for this process. After 40 years of development, a large number of image (and video) segmentation techniques have been proposed and utilized in various applications (Zhang, 2006). With many algorithms developed, some efforts have been spent also on their evaluation, and these efforts have resulted around 100 evaluation papers that can be found in literature for the last century. Several studies have been made in the past in attempt to characterize these existing evaluation methods (Zhang, 1993; Zhang, 1996; Zhang 2001). Segmentation evaluation methods can be classified into analytical methods and empirical methods (Zhang, 1996). The analysis methods treat the algorithms for segmentation directly by examining the principle of algorithms while the empirical methods judge the segmented image (according to predefined criteria or comparing to reference image) so as to indirectly assess the performance of algorithms. Empirical evaluation is practically more effective and usable than analysis evaluation (Zhang, 1996). Recent advancements for segmentation evaluation are mainly made by the development of empirical evaluation techniques. After providing a list of evaluation criteria and methods proposed in the last century as background, this article will provide a summary of the recent (in 21st century) research works for empirical evaluation of image segmentation. These new research works are classified into three groups: (1) those based on existing techniques, (2) those made with modifications of existing techniques, and (3) those that used dissimilar ideas than that of existing techniques. A comparison of these evaluation methods is made before going to the future trends and conclusion.
Empirical evaluation methods can be classified into goodness method group and discrepancy method group (Zhang, 1996). They use different empirical criteria for judging the performance of segmentation algorithms. The goodness method can perform the evaluation without the help of reference images while the discrepancy method needs some reference images to arbitrate the quality of segmentation. In Zhang (1996) the eight mostly used criteria (three goodness ones and five discrepancy ones) have been discussed in details. All these criteria have been grouped into a table in Zhang (2001, pp. 148-151), and the table is reproduced in Table 1.Table 1.
A list of empirical criteria and their method groups
|Class||Criterion name||Method group|
|D-1||Number of mis-segmented pixels||Discrepancy|
|D-2||Position of mis-segmented pixels||Discrepancy|
|D-3||Number of objects in the image||Discrepancy|
|D-4||Feature values of segmented objects||Discrepancy|
Key Terms in this Chapter
Segmentation Comparison: Segmentation comparison is an inter-algorithm process of segmentation evaluation. The purpose of comparison for different algorithms is to rank their performance and to provide guidelines in choosing suitable algorithms according to applications as well as to promote new developments by effectively taking the strong points of several algorithms.
Composite Criteria: Criteria formed by combining several performance metrics in order to better cover the various aspects of the algorithms in segmentation. The combination can be made in different ways, such as by linear combination, by machine learning approach, etc. and so forth.
Segmentation Characterization: Segmentation characterization is an intra-algorithm process of segmentation evaluation. The purpose of evaluation for a specific algorithm is to quantitatively recognize its behavior in treating various images and/or to help appropriately setting its parameters regarding different applications to achieve the best performance of this algorithm.
Image Segmentation: A process consists of subdividing an image into its constituent parts and extracting these parts of interest (objects) from the image. It is a fundamental step and a critical task in image analysis.
Subjective Criteria: Criteria based on human judgment or perception, which reflect some desirable properties of segmented images. They are used in empirical goodness methods for segmentation evaluation.
Objective Criteria: Criteria based on objectively determined quantities or values, which indicate the difference between the segmented images and reference images. They are mostly used in empirical discrepancy methods for segmentation evaluation.
Evaluation Criteria: Criteria used in evaluation process to judge the performance of segmentation algorithms under consideration. They are also called performance metrics, performance measures, or performance indices.