A Scalable Unsupervised Classification Method Using Rough Set for Remote Sensing Imagery

A Scalable Unsupervised Classification Method Using Rough Set for Remote Sensing Imagery

Aditya Raj, Sonajharia Minz
DOI: 10.4018/IJSSCI.2021040104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Reference to geographic scale and geographic space representation are characteristics of geospatial data. This work has discussed two issues related to satellite image data, namely huge size and mixed pixels. In clustering, an unsupervised classification and a set of similar objects are grouped together based on the similarity measures. The similarity between intracluster objects is high, whereas the similarity between intercluster objects is low. This paper proposes a clustering technique called spatial rough k-means that classifies the mixed pixels based on their spatial neighbourhood relationship. The authors compared the performance of different state-of-the-art clustering algorithms with that of proposed algorithms for image partitioning and map-reduce methods. The results show that the proposed algorithm has produced clusters of better quality than state-of-the-art algorithms in both the approaches used for handling the vast input data size. Experiments conducted on Landsat-TM 5 data of Delhi region demonstrate the effectiveness of the proposed work.
Article Preview
Top

1. Introduction

A phenomenon on the surface of the earth is described by sets of numerical values in a geographic coordinate system. Such represented information is called geospatial data. Reference to geographic scale and geographic space representation are the characteristics of geospatial data. Images from health care units, census data, data generated from banks, maps, student’s record in universities etc. are a few examples of spatial data. Autocorrelation, heterogeneity and highly aggregated nature of data are a few features of spatial data. Raster and vector model are used to represent spatial data. Raster data store information in the form of matrices of numerical values. Remote sensing data, satellite images, images taken from the unmanned aerial vehicles, geo-tagged photos etc. are some example of raster data. Processing and analyzing raster data are challenging tasks due to some of the issues related to it. One such issue is vastness, i.e. the raster data are enormous and contains a large number of pixels. For example, 16 billion pixels of Australia are captured in one LANDSAT satellite pass, thus capturing 400 billion pixels annually and 10 trillion pixels in satellite’s lifetime (Mills et al., 2018). The second issue in raster data analysis is the lack of class information of pixels (Aydav and Minz, 2019). The pixels of satellite images contain the reflectance value of lights from the earth’s surface at a different wavelength. This band-wise information of pixels represents the natural phenomenon in a spatial area. In the case of high spatial resolution and low spectral resolution images, it becomes challenging to predict the class of underlying pixels. Another issue with raster data pertains to the quality. Due to lack of labelled information and non-IID (Independent and Identically Distributed) nature, the raster data contains mixed pixels. Due to aggregation of reflectance values of geo-objects lying within the corresponding area of a pixel, the reflectance value of the considered pixel is affected, resulting in a spectral value that does not easily indicate the pre-defined class to which the pixel should ideally belong. The inclusion of such pixels in a cluster degrades its quality. In this manuscript, we have addressed three issues of raster data: lack of class information, mixed pixels and vastness of data.

To handle the issue of lack of class information, we used clustering (unsupervised classification) over supervised classification. Clustering is a data mining technique in which a group of similar objects, based on their characteristics or features, are placed together within a cluster (Aggarwal and Minz, 2014). Thus, a cluster contains a group of similar objects, while dissimilar objects are placed in different clusters. The distance between the centre and each object gives the similarity between the objects within a cluster. If the distance between the centre of a cluster and two objects are within the threshold limit, then the two objects belong to the same cluster, i.e. IJSSCI.2021040104.m01, then IJSSCI.2021040104.m02.

A high-quality cluster should have high cohesive (intracluster similarity) and low coupling (intercluster similarity). Different types of clustering techniques are partition-based, distribution-based, connectivity-based and density-based. IJSSCI.2021040104.m03-Means, Rough IJSSCI.2021040104.m04-Means, Fuzzy-C-means, IJSSCI.2021040104.m05-Medoid etc. are a few examples of clustering techniques. Amongst state-of-art clustering techniques, we used k-Means and Rough k-Means algorithm.

Figure 1.

Flowchart showing the issues and concepts used to handle them

IJSSCI.2021040104.f01

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing