Clustering

Clustering

Copyright: © 2023 |Pages: 21
DOI: 10.4018/978-1-6684-4730-7.ch007
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Clustering is employed to divide a data set into an appropriate number of groups. Clustering is a form of unsupervised learning, which means a data scientist can bring labelled features of interest into the mining model. Furthermore, after dividing the data set, the data scientist can label each cluster. In business, clustering is used to analyze a customer or product segment that matches a target market. This chapter introduces clustering techniques including k-means, hierarchical clustering, and DBSCAN as well as techniques to indicate the efficiency of the clustering analysis. Data scientists can assess the efficiency of clustering analysis in two ways. Firstly, subjective measurement is where a data scientist consults a domain expert to confirm the efficiency of the cluster analysis, and secondly, data scientists can use objective measurements that test the efficiency of the cluster analysis result based on calculations. This chapter demonstrates cluster analysis adoption with RapidMiner so that readers can follow the process step-by-step.
Chapter Preview
Top

K-Means Clustering

Once data scientists are asked about the nature of the data, such as the purchasing behavior of each group of customers, they need to find a consistent data set to answer those questions (Ginting, 2021: Puspasari et al., 2021). The dataset initially received is characterized as Unlabeled Data, the data which has not yet been clustered nor defined for its name or meaning. Therefore, these data sets can be clustered and defined according to the objectives of the data analysis, such as definition of the customers; the middle-class or the wealthy.

Complete Chapter List

Search this Book:
Reset