Metamorphic Testing of Image Classification and Consistency Analysis Using Clustering

Metamorphic Testing of Image Classification and Consistency Analysis Using Clustering

Hemanth Gudaparthi, Prudhviraj Naidu, Nan Niu
DOI: 10.4018/IJMDEM.304390
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Testing deep learning systems requires expensive labeled data. In recent years, researchers began to leverage metamorphic testing to address this issue. However, metamorphic relations on image data remain poorly understood. To gain a deeper understanding of these metamorphic relations, we survey common image operations modeling covariate shift, manually classify and categorize the underlying metamorphic relations, and conduct experiments to validate our classifications. In our experiments, we train three popular convolutional neural network architectures on an image classification task. Next, we apply metamorphic operations on input test images and measure the change in classification accuracy and cross-entropy loss. A hierarchical clustering algorithm cluster these results and plots a dendrogram. We compare the groups from manual classification and the clusters from the algorithm to provide key insights. We find that Affine and Noise relations are consistent. Furthermore, we recommend metamorphic relationships to save time and better test deep learning systems in the future.
Article Preview
Top

Introduction

Deep learning (DL) has proliferated not just in research papers but in many walks of life. For example, our ongoing work exploits DL in predicting combined sewer overflows Gudaparthi et al. (2020), Challa et al. (2020) and Matlibe et al. (2021). If data is the new oil Bhageshpur (2019), then DL systems have become the de facto refineries. One such area where DL has shone the brightest is computer vision. The culprit for this rise are convolutional neural networks (CNNs). In 2012, Ciresan et al. (2012) used CNNs to achieve near-human performance of image classification on the MNIST dataset. From self-driving cars Bojarski et al. (2016), through detecting objects from aerial images Xia et al. (2018), to photo beauty mobile applications Xu et al. (2019), the CNN-based computer vision solutions have become widespread. As with all software systems, DL systems come with their own bugs and fallacies.

As one of the premises of DL is to raise the level of intelligence thereby requiring fewer human validation of the system, the bugs can lead to greater disasters. For instance, in a 2018 Tesla Autopilot crash, the National Transportation Safety Board found the crash occurred due to “limitations of the Tesla Autopilot vision system’s processing software to accurately maintain the appropriate lane of travel” National Transportation Safety Board (2020). In another case occurred in China, a facial recognition system tagged a woman as a jaywalker, while she was never actually there at the intersection Liao (2018).

These cases illustrate why thoroughly testing DL systems has become critical. However, software testing often requires labeled data, which is a costly resource commonly expended in training DL systems rather than testing them. Moreover, data labeling is time-consuming and can sometimes be error-prone. Additionally, real world data can change before DL system can be tested.

To address the bottleneck of lacking sufficient labeled data, researchers began to leverage metamorphic testingChen et al. (1998), a property-based software testing technique useful for alleviating the oracle problem Lin et al. (2018) and for generating new test cases. The prototypical example of metamorphic testing is the program that computes the sine function Segura et al. (2016): The exact value of sine (x) could depend on how floating-point computations are handled in the specific implementation, representing an instance of the oracle problem. Metamorphic testing uses properties like sine (x) = sine (π−x) to test any implementation without having to know the concrete values of either sine calculation, i.e., without knowing the test oracle of sine (x) or the test oracle of sine (π−x).

Properties like sine (x) = sine (π−x) are known as metamorphic relations (MRs) Lin et al. (2021). Each MR consists of two parts: (1) an input transformation that can be used to generate new test cases from existing test data, and (2) an output relation that compares the outputs produced by a pair of test cases Segura et al. (2016). As far as image data is concerned, Ding et al. (2016) developed five MRs based on input biological-cell images to validate the open-source light scattering simulation software that performs discrete dipole approximation: altering the image size, shape, orientation, refractive index, etc.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing