Remote Sensing Image Segmentation Based on a Novel Gaussian Mixture Model and SURF Algorithm

This paper proposes a novel remote sensing image segmentation method based on Gaussian mixture model and SURF algorithm. Firstly, Gaussian mixture model is used for remote sensing image segmentation. Then the surf matching algorithm is adopted for eliminating misidentified areas. The determinant of Hession matrix (DoH) is used to detect key points in the image. The non-maximum suppression method and interpolation operation are utilized to search and locate the extreme points. The maximum likelihood method is used to estimate model parameters. Some remote sensing images in THE DOTA data set are selected for experimental verification, and the results show that the new algorithm has obvious improvement in segmentation effect and efficiency. In the background complex image segmentation, the improved algorithm has more obvious advantages compared than state-of-the-art segmentation methods.


INTRodUCTIoN
Image segmentation is an important stage in image processing, which has been widely used in the fields of image recognition, 3D reconstruction and target tracking (Hong et al. 2021;Teng et al. 2022).The areas of interest in the image are called the target or foreground, and the rest are called the background.The essence of image segmentation is to separate and extract the target area in the image (Hatamizadeh et al. 2022).
With the rapid development of remote sensing sensor technology, the spatial resolution of remote sensing image is constantly improved and the information contained is more abundant, so it is widely used in many fields (Tesfamikael et al. 2021).However, due to the improvement of resolution, the spectral heterogeneity of similar features is enhanced, which makes the processing of remote sensing images more difficult.Image segmentation is a key link in the process of image processing, and the results have an important impact on the subsequent work of image processing (Cheng et al. 2021;Minaee et al. 2021).Remote sensing image segmentation covers two basic tasks: 1) determine the number of ground objects (the number of pixels) contained in the region to be segmented in the remote sensing image, and 2) accurately segment the object regions.
In modern remote sensing image segmentation, due to the improvement of spatial resolution, various types of ground objects, complex background environment and difficulties in field investigation, it is difficult to manually interpret the number of land objects covered in remote sensing image (Xu et al. 2021;Liu et al. 2021).If the number of terrestrial species cannot be correctly pointed out, it will lead to wrong segmentation result.In addition, high resolution enhances the ability of remote sensing image to express the details of the covered surface.However, on the other hand, excessive detail expression will lead to the difference increase of the spectral measure of pixels in the similar ground objects (manifested as the increase of noise), which causes great difficulties for the segmentation (Lu 2021;Wang et al. 2022).Therefore, it has always been an important task in the research of high-resolution remote sensing image segmentation to automatically determine the class number and improve the anti-noise performance of the algorithm (Li et al. 2021).
In the many remote sensing image segmentation methods, the statistical segmentation framework is the most effective method.The segmentation method based on hybrid model fits the image to be segmented with the weighted sum of multiple probability distributions.The probability distribution of each mixed component describes the probability statistical distribution law of pixel spectral measure in a specific target class (Yuan et al. 2021).Among them, GMM (Gaussian mixture model) is widely used in remote sensing image segmentation because of its simple principle, intuitive structure and easy realization (Zhu et al. 2022;Chen et al. 2021).However, in the traditional GMM, only the gray information of the image is used, and the spatial relationship between the neighborhood pixels is not introduced, which leads to phenomenon that the segmentation result is very sensitive to the image noise.
At present, image segmentation algorithms can be divided into supervised image segmentation method, unsupervised image segmentation method (Mirsadeghi et al. 2021) and interactive image segmentation method (Chen et al. 2021).The interactive image segmentation is to obtain the region containing the target with the help of artificial interactive operation, and the segmentation results are more accurate.Here, GrabCut algorithm has convenient interactive mode (Salau et al. 2021) and good segmentation effect, it has become a research hotspot in the field of image segmentation.Gupta et al. (2021) introduced the concept of graph theory into the field of image segmentation.The image was mapped to a weighted graph with pixels as nodes.Edges between nodes represented the relationship between pixels, and weights represented the degree of correlation between pixels.Finally, the maximum flow/minimum cut was used to segment the graph.The image segmentation problem was transformed into the graph vertex labeling problem.The graph cuts algorithm based on Graph theory was proposed by Bao et al in (2001), and the interactive GrabCut color image segmentation algorithm (Koresh et al. 2021), both of which belonged to the interactive image segmentation algorithm based on graph theory.In references (Ma et al. 2021;Di et al. 2021), the superpixel method was used to pre-process images to improve the segmentation accuracy and time efficiency.Zhang et al (2021) used significant preprocessing to improve the segmentation performance.In reference (Jin et al., 2021), Bayesian classification and simple linear generation selection clustering were used to improve the algorithm.In reference (Yong et al. 2017), GrabCut texture of wavelet transform was used for image segmentation.However, the segmentation effect of the above algorithms at the foreground background junction is not ideal.The segmentation results have the phenomenon of under-segmentation to some extent.Moreover, the algorithm optimizes the Gaussian mixture model through iterative learning, which will take a certain amount of time, resulting in low segmentation efficiency.
In order to solve the problem that GMM algorithm is susceptible to image noise, many scholars have introduced spatial constraints into the design of remote sensing image segmentation.GMM combined with MRF (Markov random field) has become a research hotspot in the field of image segmentation.A spatially variant finitemixture model (SVFMM) image segmentation method was proposed in reference (Juan-Albarracín et al. 2021).Based on GMM, the weight coefficient of Gaussian component was expressed as the prior probability of each pixel.The spatial correlation of the prior probability was modeled by Gibbs distribution to improve the anti-noise performance of the algorithm.Although this method reduced the sensitivity of segmentation results to image noise, the algorithm was not flexible in smoothing noise because the parameter b of smoothing noise control ability in prior distribution was set as a constant.In order to further improve the anti-noise performance of the algorithm and overcome the defects of SVFMM algorithm, A-SVFMM (adaptive spatially variant finite mixture model) was proposed in reference (Ji et al. 2014) based on SVFMM algorithm.The algorithm used MRF to re-model prior distribution.It was assumed that the prior probability obeyed a normal distribution with a mean of 0 and a certain variance.Its variance was set as a random variable related to the class label.The maximum likelihood (ML) function method was used to solve the model parameters.Due to the need to meet the constraints of prior probability, the process of solving the analytic expression of prior probability was complicated.Although this method improved the flexibility of the algorithm against noise, it increased the difficulty of solving the model parameters.At the same time, the accuracy of the segmentation results of high resolution remote sensing images was not high.
Although the above methods improve the anti-noise performance of the algorithms to a certain extent, it does not realize the automatic determination of image class number.To automatically determine the number of class members, many scholars have proposed a variable class image segmentation method based on Gaussian mixture model.Pravitasari et al. (2021) proposed an image segmentation algorithm based on RJMCMC (reversible jump Markov chain Monte Carlo) method with multivariate Gaussian mixture model.The sampling simulation operation of the algorithm included label field sampling, Gaussian distribution parameter sampling, mixed weight coefficient sampling, MRF parameter sampling and class number sampling.The algorithm could determine the most possible class number of the image and estimate the model parameters to achieve the purpose of image segmentation.However, this algorithm was only applied to color image segmentation, not to high resolution remote sensing image.Meanwhile, the RJMCMC method used in this algorithm had many steps and was difficult to implement.Baumgartner et al. (2015) put forward a remote sensing image segmentation algorithm combining GMM and MRF under the framework of Bayesian theory, and took a maximum posterior probability as the parameter to achieve the goal of image segmentation.A new energy function was proposed to describe the difference between pixel spectral measure values and the distance between elements, which improved the anti-noise performance of the algorithm.But the form of energy function was complex and did not have universal applicability.In order to automatically determine the class number, the method specified a number of class numbers in advance within a certain possible range, and realized the corresponding optimal segmentation for each specified class number.The entropy function of optimal segmentation was calculated one by one, and the maximum entropy criterion (Rafique et al. 2022) was adopted to determine the optimal class number, that is, the corresponding entropy value was obtained by calculating the class number one by one, from which the class number corresponding to the convergence point of entropy value was the optimal class number.Although the maximum entropy criterion could determine the class number of images, the process of determining the class number needed to perform fixed class number segmentation for many times, which leaded to large computation and low efficiency.
Therefore, we propose a novel remote sensing image segmentation method based on Gaussian mixture model and SURF algorithm.The main contributions are as follows.Firstly, Gaussian mixture model is used for remote sensing image segmentation.Then the SURF matching algorithm is adopted for eliminating misidentified areas.The determinant of Hession matrix (DoH) is used to detect key points in the image.The non-maximum suppression method and interpolation operation are utilized to search and locate the extreme points.The main direction of the key points is determined by gray centroid method.The key points are then described using binary descriptors.The maximum likelihood method is used to estimate model parameters.

definition of Image Segmentation
Image segmentation is an important task to understand images and extract effective information from images.It is a hot research problem in image processing and computer vision.Image segmentation technology divides the image into several independent and unique regions, which is the key step of image processing in machine vision.Image segmentation techniques are mainly divided into the following categories: the threshold-based, region-based, edge-based and the segmentation methods based on specific theories (Wang et al. 2021).Image segmentation is defined mathematically as the process of dividing an image into non-intersecting regions.The process of image segmentation is also a marking process, that is, the pixels belonging to the same area are assigned the same number.The mathematical representation is as follows.
Let R represent the entire space area occupied by an image, image segmentation can be regarded as the process of dividing The following conditions should be followed: for any R i and the adjacent region of R j .
where Q R k ( ) is a logical property defined at the point of set R k , and AE represents the empty set.
The symbols È and Ç represent the union and intersection of sets, respectively.If the union of R i and R j forms a connected set, the two regions are adjacent.Conditions 1 -2 are interpreted as follows: the segmentation must be complete, that is, any pixel belongs to a unique region, and the points in a region are connected in some predefined way (that is, the points must be 4-connected or 8-connected).The regions must be disjoint.The pixels in the segmented area all have the same gray level.The two adjacent regions R i and R j must be different in the sense of the property Q .

Image Segmentation Based on Clustering
There is no unified general theory of image segmentation technology.Combining with many new theories and methods, a variety of image segmentation algorithms have been proposed.Clustering is an important method, and the feature space clustering algorithm is used to cluster pixels of similar features in the image.Based on the clustering results of pixel points, the feature space of the cluster is mapped to the original image to get the segmentation results.
GMM is a probabilistic model which is often used in image segmentation.The theoretical background is to aggregate samples into different Gaussian models according to intra-class distance or probabilistic likelihood.In the sample space, if the distance between two samples is closer, the probability of being clustered into the same class is higher.
The advantages of clustering algorithm in image segmentation are simple, fast, high efficiency and strong extensibility for large data sets.The time complexity of this algorithm is linear, which is suitable for the processing of high dimensional images.But an important problem is that the clustering algorithm is sensitive to the initial value and cannot estimate the number of clusters effectively.In addition, clustering algorithms (Yin. et al. 2020) (such as EM) aim at minimizing distance or maximizing likelihood, so their performance is poor in non-convex Gaussian models.

PRoPoSed ReMoTe SeNSING IMAGe SeGMeNTATIoN MeTHod
In this section, we conclude the proposed method as shown in figure 1.It mainly contains three parts: GMM for probabilistic estimation, SURF feature matching and Ratio detection for removing mis-matched points.

Gaussian Mixture Model
GMM uses Gaussian probability density function to accurately quantify images and decomposes an image into several models based on Gaussian probability density function stroke.The process of GMM image segmentation is as follows.Suppose the image is divided into K parts, and the pixels in each image region obey the normal distribution with mean m and variance s 2 .Then the feature distribution of the whole image can be described by GMM.
If the Gaussian distribution of the k th region is shown in Formula (1), then the Gaussian distribution of the whole image can be expressed by formula (2): (1) , , , ; , , ,   n K .To achieve image segmentation, it is necessary to obtain the joint probability density function of p k and k under the condition of known images x and q .According to Bayes theorem, the joint probability density function is: The parameters in Equation ( 4) include: class number k , mean vector

Solving j with the ML estimation
We construct the p k logarithmic likelihood estimation function as: Equation ( 5) is used as the objective function for parameter solving.However, since Equation (5) contains the logarithm of sum, it is difficult to obtain its maximal parametric solution, so the objective function needs to be further simplified.
For a given random variable X , for any concave random function f X ( ) , Jensen inequality can be expressed as

and the equation holds if and only if
random variable Y i .z ij can be regarded as the probability of obtaining Y ij .Thus, the 4), can be written as: Since log ⋅ ( ) is a concave function, according to Jensen inequality rule, we have: To sum up, let A = 1 and T = 1 , and it ignores the terms irrelevant to the parameter j in Equation ( 4), then: If formula ( 8) is an equality, then π θ . Since the posterior probability meets the condition , and the posterior probability formula is: The simplified objective function is: The partial derivatives of the objective function of Equation ( 10) with respect to m j and s j 2 are calculated respectively.Setting formula (10)=0, it gets: Because the prior distribution needs to satisfy the constraint condition p p . Therefore, the Lagrange multiplier method is introduced to find the partial derivative of the prior probability parameters.Set it equal to 0, and it gets:

Feature Points Detection
SURF algorithm uses the integral image theory to replace the filtering of images and Gaussian second-order differential templates by simple addition and subtraction of integral images to improve the computing speed.The definition of integral image is described in reference (Wang et al. 2019).Therefore, the gray value integral of S region in Figure 2 can be obtained by S A B C D = − − + .The Hession matrix with scale s for the point x y , ( ) of image J is: where ( ) is the second partial derivative of the Gaussian function at point x and the convolution of the image J x ( )  3. Therefore, the determinant of the Hessian matrix can be abbreviated as: where Det H approx ( ) represents the box filtering response value in the area near point x , which can be used to detect extreme points.w is a coordination parameter whose size is related to s , and w is usually valued at 0.9.In order to obtain the spots of different scales, the Gaussian pyramid of the image needs to be constructed first.The basic idea of building the image Gaussian pyramid is to keep the original image size unchanged, and the scale space can be obtained by changing the filter size.The SURF algorithm continuously increases the size of the box filter template, uses the box filter template with different sizes and integral images to obtain the response value of the Hessian matrix, and calculates the feature points of different scales on the response images.
3×3×3 neighborhood non-maximum suppression is carried out on the response images of feature points at different scales, so that the pixels on each layer of the image are compared with 8 pixels on the same scale and 9 pixels on the upper and lower layers of the image adjacent to it, as shown in figure 4.

Integral image method
If the extreme value is greater than or less than the extreme value of the 26 points, this point is used as a candidate feature point.Then, the quadratic function fitting method is used to locate the interest points, and the positioning accuracy reaches sub-pixel and sub-scale level to obtain the scale and position information of stable interest points and complete the positioning.

Feature Point Descriptor
In order to make the target image rotationally invariant, the main direction of the feature points must be determined.Firstly, taking the above feature points as the center, the Harr wavelet responses of the points in the neighborhood with the radius of 6 s ( s is the scale value where the feature points are located) in the x and y directions are calculated.The Harr wavelet side length is set to 4s , and the response value is weighted by Gaussian.Then, the response values in the 60° sector area are accumulated to form a new vector, and the whole circular area is traversed.The direction of the vector with the longest length is selected as the main direction of the feature point, as shown in Figure 5.The above operations have determined the position of the feature points and the main direction, and then the descriptor is computed for the local region.Firstly, it rotates the coordinate axis to the direction of the feature points to ensure rotation invariance.With feature points as the center, it is carried out in a rectangular region parallel to the main direction.The size of this area is 20 20 s s ´ and it is divided into 4×4 sub-blocks.Then the response value of each sub-block is calculated by using the Harr wavelet template with the size of 2 s , and the vector of each sub-block is obtained.Since there are 4×4 sub-blocks in this region, the generated feature descriptors are composed of 4×4×4=64-dimensional feature vectors.Then, the ratio detection method (Chekanov et al. 2022) is used to remove the false matching points of those multi-point matching.

eXPeRIMeNTS ANd ANALySIS
This section designs a comparison experiment on the DOTA data set (Xia et al. 2018), and selects four remote sensing images to verify the effectiveness of the proposed segmentation algorithm in this paper.We also make comparison with KAZE (Pourfard et al. 2021), CPLF (Su et al. 2022).These experiments are conducted on MATLAB 2017a.
Figure 6 displays the segmentation result with different methods.We can see that the segmentation effect is better with proposed method.KAZE and CPLF have the problem of incoherent segmentation of the object edge.Over-segmentation is also a serious problem.If the background is too complex, segmentation is very inefficient.The method in this paper can effectively improve the above segmentation problems.
Segmentation accuracy and segmentation time are used as evaluation indexes.The detailed results are shown in table 1.
As can be seen from Table 1, for image1, the segmentation accuracy of the proposed method reaches 97.5%, while the accuracy of KAZE and CPLF reaches 89.6% and 92.1% respectively, which are 7.9% and 5.4% lower than that of the proposed method.For image2, the segmentation accuracy of the proposed method reaches 96.4%, which is 6.0% higher than that of AKAZE and 4.7% higher than that of CPLF.Similarly, the results are similar for the other two images.The segmentation time of the proposed method is also the shortest.Therefore, comprehensive analysis shows that the efficiency of the proposed method is the best.
And then we conduct a set of ablation experiments with SURF and without SURF for showing the advantage of removing mismatched points.The purpose is to prove the effectiveness of the matching algorithm for the segmentation algorithm.It is obvious that the segmentation effectiveness is greatly improved after adding SURF. Figure 7 clearly shows that after adding SURF, the segmentation effect is significantly improved.The minimum efficiency is above 96%.

CoNCLUSIoN
Gaussian mixture model image segmentation is one of the main research tasks in remote sensing image segmentation.Therefore, this paper proposes a novel remote sensing image segmentation method based on Gaussian mixture model and SURF algorithm.By using a new energy function to change the iteration process, the new method reduces the large amount of time spent by the traditional GMM algorithm in the segmentation iteration, and improves the situation that the segmentation results are still  under segmentation or over segmentation when the iteration converges.In addition, the SURF matching algorithm is used to eliminate mismatched points, reducing the complexity of feature description and improving the algorithm computing speed.Finally, the experiments are carried out on the open remote sensing DOTA data.From the analysis of subjective and objective evaluation indexes, it is concluded that the new method in this paper has significantly improved the segmentation accuracy and segmentation time.In the next step, each object category is modeled as a GMM component to accurately depict the statistical distribution characteristics of pixel spectral measurement.

CoNFLICTS oF INTeReST
I declare that the guest editor of the special issue being an author for this article.

Figure 1 .
Figure 1.Proposed remote sensing image segmentation structure Figure 2. Integral image method Figure 3. Box filter Figure 5. Determination of main direction Figure 6.Segmentation with different methods are similar to it.The convolution operation is transformed into a box filtering operation, which is a simplification of the Gaussian second order differential template.A box filter with 9×9 is an approximation of a Gaussian filter with s =1.2.It is used as the latest scale space value in image filtering and spot detection.The values after convolution operation are D xx , D xy and D yy , respectively.They will replace L xx , L xy and L yy , as shown in Figure . L x xy , s ( ) and L x yy , s ( )

Table 1 . Segmentation results with different methods
This work was supported in part by the National Natural Science Foundation of China under Grant 62071084 and National Natural Science Foundation of China (62001434).Also supported by Talents Project of the State Ethnic Affairs Commission.Supported by Scientific Research Funds of Education Department of Liaoning Province in 2021 (General Project) (LJKZ1311).