Discovering Knowledge by Comparing Silhouettes Using K-Means Clustering for Customer Segmentation

Discovering Knowledge by Comparing Silhouettes Using K-Means Clustering for Customer Segmentation

Zeeshan Akbar, Jun Liu, Zahida Latif
Copyright: © 2020 |Pages: 19
DOI: 10.4018/IJKM.2020070105
(Individual Articles)
No Current Special Offers


A large amount of data is generated every day from different sources. Knowledge extraction is the discovery of some useful and potential information from data that can help to make better decisions. Today's business process requires a technique that is intelligent and has the capability to discover useful patterns in data called data mining. This research is about using silhouettes created from K-means clustering to extract knowledge. This paper implements K-means clustering technique in order to group customers into K clusters according to deals purchased in two different scenarios using evolutionary algorithm for optimization and compare silhouettes for different K values to analyze the improvement in extracted knowledge.
Article Preview

1. Introduction

In companies, market or customer segmentation has become a significant tool for product portfolios and marketing strategies as it is difficult to take purchase data of customers and understand it or deal personally with each customer through different ways. The companies analyze the customer’s behavior by understanding their preferences and needs. The companies group customers into different segments with the help of clustering technique and the segments exhibits similar characteristics like demands or preferences (Silver 2018). The companies by collecting data about their customers and extracting knowledge from raw data gains an additive advantage, like using mining techniques that can understand customer buying habits. Clustering is the most common customer segmentation data mining technique (Saglam et. al 2006). In Cluster analysis we gather a bunch of objects and separate them into groups of similar objects. The similarity is measured using Euclidean distance and there are other distance measures like Manhattan distance, Minkowski distance as well. The quality of the clusters depends on discovering patterns that reveal some or all hidden knowledge. Then we explore these different groups by determining how they are similar and different from each other. Clustering in this context is called exploratory data mining (Foreman 2013). Customer segmentation helps in taking effective business decisions by deriving knowledge such as management of demand and supply by identifying relationships of products to each segment or between customers which are difficult to find otherwise. Also clustering helps in identifying crime in certain areas and recommending movies in taste cluster by identifying relationships in population.

In this paper we use a partition based popular approach of clustering called K-means for market segmentation of customers so that targeted product content can be marketed to segments. This method first creates initial K partitions in which parameter K is the required number of clusters and then uses an iterative relocation technique in order to improve the partitioning of customers. This method uses center based cluster criterion often called cluster centroid. Each group center has a mean from which the K-mean gets its name.

In K-means clustering process we place data points in K groups where K is a number. Each group is defined by a point in the center (mean) called cluster centroid due to which it is called K-means. We compute a score called silhouette in order to validate the clusters formed. If the average silhouette value is +1, then customers are perfectly assigned to clusters and knowledge extracted is reliable but if it’s close to 0 then cluster assignments are bit shaky and if it’s less than 0 then a lot of customers are in other clusters. We implement two different scenarios by varying K value and knowledge is extracted. The knowledge as explained in the context of big data and revised knowledge pyramid in a research (Jennex 2017) is analyzed according to deals popularity among the clusters for actionable intelligence for each value of K. The average silhouette value for each K is also computed in order to see if we are improving in context of knowledge extraction and would help in describing the clusters better.

1.1. K-Means Clustering

In K-means clustering process the user selects the K initials centroids by specifying the value of K for the desired number of clusters. Then clusters are formed by assigning each point in the data to the closet centroid (Jain 2010; Li et. al 2018). The cluster’s centroids are updated based on points allocated to each cluster and the process repeats until the points do not change position (Yin et. al 2018). Mathematically:


In the above equation wk are the cluster centroids, the data points in the cluster are denoted by x(i) and M indicates total number of data points. The distance between object x and centroid wk is computed by the objective function J using the squared Euclidean distance i.e. ‖ x-m ‖2. A score value called silhouette is computed for the validation of the value of K. This score value is used for the comparison of different K values. The silhouette value has range between -1 to +1.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing