Scene-Based Priors for Bayesian Semantic Image Segmentation

Scene-Based Priors for Bayesian Semantic Image Segmentation

Christopher Menart (Ohio State University, USA), James W. Davis (Ohio State University, USA), Muhammad N. Akbar (Ohio State University, USA) and Roman Ilin (Air Force Research Laboratory, USA)
Copyright: © 2019 |Pages: 14
DOI: 10.4018/IJSST.2019010101


Based on the observation that semantic segmentation errors are partially predictable, this study proposes a compact formula using the confusion statistics of a trained classifier to refine (re-estimate) the initial label hypotheses. The proposed strategy is contingent upon computing the classifier confusion probabilities for a given dataset and estimating a relevant prior on the object classes present in the image to be classified. This study provides a procedure to robustly estimate the confusion probabilities and explore multiple prior definitions. Experiments are shown comparing performances on multiple challenging datasets using different priors to improve a state-of-the-art semantic segmentation classifier. The study demonstrates the potential to significantly improve semantic labeling and motivates future work for reliable label prior estimation from images.
Article Preview


Semantic segmentation is a challenging computer vision problem wherein a class label is assigned to each pixel in an image. This provides much richer scene information than traditional image classification, and therefore it is inherently a more difficult task. While image classification requires only detection of the primary object, semantic segmentation requires precise, localized pixel-wise detections. Annotating training images for semantic segmentation in a supervised manner is more expensive and time-consuming, and as a result, most public datasets are relatively small in comparison to those for image classification (e.g., ImageNet) (Deng, Todorovic, & Latecki, 2015).

Current semantic segmentation methods show promising results on multiple datasets. However, consistent errors in their output can readily be found. For example, RefineNet (by Lin et al. (2017)) mis-classifies ground as sidewalk so often in the PASCAL-Context dataset (by Mottaghi et al. (2014)) that pixels classified as sidewalk are 60% likely to have the label ground in truth. In this example, pixel accuracy would actually be improved by simply relabeling all sidewalk pixels as ground. However, we would prefer not to lose all ability to classify sidewalks, but rather exploit our knowledge of this confusion to reason about the appropriate labels to assign and the corresponding confidence we should have.

We categorize labeling errors into two types. An “in-context” labeling error is when a pixel is assigned an incorrect label from the set of actual labels for a given image (e.g., a foreground object label is assigned to an incorrect location). An “out-of-context” labeling error is an invalid pixel label assignment that is outside of the actual label set for the image (e.g., an indoor object label is assigned to a pixel in an image of an outdoor scene). Examples of these errors are shown in Figure 1.

Given that a classifier will have similar performance characteristics on related images, an analysis of the classification errors and label confusions across a dataset should support a secondary (post-processing) refinement stage to re-estimate the output label likelihoods. These more informed classifications should greatly reduce the incidence of in-context errors and make out-of-context errors less likely. The proposed approach is based on these motivations.

Our method is derived from a direct marginalization of p(l|d), the probability of label l given input data d, resulting in a decomposition of the formulation into classifier output label probabilities and learned classifier/truth confusions. Applied together, the framework treats the confidence levels of the original classifier output as partial evidence and incorporates the confusion information to determine the final probability of witnessing each object label at each location in an image. In a sense, this confusion approach can be considered a form of context-aware re-estimation. We present a robust method to compute the confusion probabilities and also outline various label priors used to bias the refinement. Upper bound performances of the framework with various priors are reported for three challenging datasets to justify the approach.

The rest of this paper is organized as follows. Section II describes related work in semantic segmentation. Section III describes our framework for classifier refinement using label confusion probabilities and priors. Section IV presents experimental results showing the performance improvements of the approach on multiple datasets, followed by a conclusion in Section V.

Figure 1.

Example contextual errors of RefineNet on PASCAL-Context. In-Context: The sofa back in the left image is classified as floor, which is incorrect even though the label correctly appears elsewhere in the image. Out-of-context: A portion of the right image is classified as motorbike, which is not a valid label anywhere in the image.


Convolutional Neural Networks (CNNs) such as those in Lin et al. (2017), Long et al. (2015), Chen et al. (2016), and Lin et al. (2016), have achieved unprecedented results in semantic segmentation. However, they also introduce a tension between precise localization and the inclusion of broader image context (Garcia-Garcia, Ors-Escolano, Oprea, Villena-Martinez, & Garcia-Rodriguez, 2017). Many recent innovations in network architecture have been motivated by this issue.

Complete Article List

Search this Journal:
Open Access Articles
Volume 7: 2 Issues (2020): 1 Released, 1 Forthcoming
Volume 6: 2 Issues (2019)
View Complete Journal Contents Listing