Use of PCA Solution in Hierarchy and Case of Two Classes of Data Sets to Initialize Parameters for Clustering Function: An Estimation Method for Clustering Function in Classification Application

Use of PCA Solution in Hierarchy and Case of Two Classes of Data Sets to Initialize Parameters for Clustering Function: An Estimation Method for Clustering Function in Classification Application

Peilin Li
DOI: 10.4018/978-1-7998-4444-0.ch010
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In literature, the initial parameters are critical for K-means function. By seedling randomly or ad hoc approach, the results are not optimal. This chapter details an estimation method using the principal component analysis (PCA) solution based on the connection between PCA solution and membership of clusters from K-means from research. All the mathematical justification is provided. The evaluation has been done empirically with a comparative study. The validation results show the significant feasibility of the proposed method to initialize parameters.
Chapter Preview
Top

Background

The machine vision system is designed as input to identify the targets in the feature-fluent background. However, the identification of the target is nontrivial because the natural image data is normally nonlinear with an unknown density in the color space (Jimenez, Ceres, & Pons, 2000). With reference to the autonomous human vision, the unsupervised machine learning has been investigated and approached in applications (LeCun, Bengio, & Hinton, 2015). In which, the clustering function is one of the classical methods for classification application (Duda, Hart, & Stork, 2000; Jain, 2010). Without any prior knowledge of the cluster information, the clustering has been a challenging problem due to the difficulties of finding a proper design of an objective or a metric for the structure of the data. Among the categories of clustering (Xu & Wunsch, 2005), the partitional clustering incorporates the shape and the number of clusters by using certain metrics and prototypes in a squared error function, for example, K-means function. However, some parameters are required by the function (Du & Swamy, 2006), namely the number of clusters K, the initial value for the clusters, and dissimilarity metric. In the original K-means algorithm, these parameters are chosen randomly (Chen, Tai, Harrison, & Pan, 2005; Pena, Lozano, & Larranaga, 1999). Hence, the seedling of these initial parameters can direct the function to converge into different local optima by the multivariate fact (Xu & Wunsch, 2005).

Key Terms in this Chapter

Classification: Classification is an application of pattern recognition by the assignment of the data instance with label. The assignment is realized by the measurement using certain dissimilarity metrics function.

Unsupervised Learning: Unsupervised learning is a kind of self-mapping or self-organized learning procedure to group the different patterns without pre-learned parameters or labels.

Supervised Learning: The supervised learning is the common machine learning in which the training samples with credit are used to find the parameters namely weights of a such as hyperplane for the subsequent classification application.

Initial Parameters: When a function such as clustering function contains multi-variable or multi-factor, the number of variables and the initial values for the variables or factor are defined as initial parameters.

Clustering: Clustering is a procedure to group the different objects into the same category which has similar properties measured by the specific metrics.

Thresholding: Thresholding is method to segment image data with hyperplane.

Image Segmentation: Image segmentation subdivides the image into constituent features with certain dissimilarity properties using the measuring metrics from each other.

Complete Chapter List

Search this Book:
Reset