The problem of analyzing datasets and classifying them into clusters based on known properties is a well known problem with implementations in fields such as finance (e.g., pricing), computer science (e.g., image processing), marketing (e.g., market segmentation), and medicine (e.g., diagnostics), among others (Cadez, Heckerman, Meek, Smyth, & White, 2003; Clifford & Stevenson, 2005; Erlich, Gelbard, & Spiegler, 2002; Jain & Dubes, 1988; Jain, Murty, & Flynn, 1999; Sharan & Shamir, 2002). Currently, researchers and business analysts alike must try out and test out each diverse algorithm and parameter separately in order to set up and establish their preference concerning the individual decision problem they face. Moreover, there is no supportive model or tool available to help them compare different results-clusters yielded by these algorithm and parameter combinations. Commercial products neither show the resulting clusters of multiple methods, nor provide the researcher with effective tools with which to analyze and compare the outcomes of the different tools. To overcome these challenges, a decision support system (DSS) has been developed. The DSS uses a matrix presentation of multiple cluster divisions based on the application of multiple algorithms. The presentation is independent of the actual algorithms used and it is up to the researcher to choose the most appropriate algorithms based on his or her personal expertise.
Key Terms in this Chapter
Likelihood Measurement: Likelihood measurement is the measure that allows for the classification of a dataset using hierarchical clustering algorithms. It measures the extent to which a sample and a cluster are alike.
Vote Matrix: Vote matrix is a graphical tool used to present a dataset classification using multiple algorithms.
Hierarchical Clustering Algorithms: Hierarchical clustering algorithms are clustering methods that classify datasets starting with all samples representing different clusters and gradually unite samples into clusters based on their likelihood measure.
Dendrogram: Dendrogram is a method of presenting the classification of a hierarchical clustering algorithm.
Distance From Second Best (DFSB): DFSB is a method of calculating the distribution of votes for a certain sample. This method is based on the difference between the highest number of similar associations and the second-highest number of similar associations.
Heterogeneity Meter: Heterogeneity meter is a meter of how heterogenic a certain association of clusters resulting from the implementation of an algorithm is.
Decision Support System (DSS): DSS is a system used to help resolve certain problems or dilemmas.