Categorical Data Clustering Using Harmony Search Algorithm for Healthcare Datasets

Categorical Data Clustering Using Harmony Search Algorithm for Healthcare Datasets

Abha Sharma, Pushpendra Kumar, Kanojia Sindhuben Babulal, Ahmed J. Obaid, Harshita Patel
Copyright: © 2022 |Pages: 15
DOI: 10.4018/IJEHMC.309440
Article PDF Download
Open access articles are freely available for download

Abstract

Healthcare analytics provide many benefits in healthcare dashboard systems. Healthcare datasets majorly contains categorical attributes. This paper proposed an optimized clustering for healthcare dataset named harmony search based categorical clustering (HSCC). The existing k-modes clustering algorithm is one of the well-known categorical data-clustering algorithm. Since the k-modes algorithm produces local optimal clusters. Generally, researchers use genetic algorithm (GA) based clustering algorithms to converge locally optimal solutions to global optimal solutions. GA has some deficiencies such as premature convergence with low speed. In this paper, harmony search (HS) optimization algorithm used to optimize clustering results. The result shows the proposed HSCC algorithm produced global optimized solution, unbiased and matured results. HSCC produces 98% accuracy for dental and 71% for lung cancer dataset. While GACC produces 95% and 65% accuracy for dental dataset and lung cancer dataset.
Article Preview
Top

Introduction

In the current few decennary, the paper-based system has been changed to the electronic system (ES) by various sectors including the healthcare sector as well. The ES system improves productivity and outcomes of the sector where it is applied (Yoo et al., 2012). Healthcare organizations and institutes are collecting electronic health data using online insurance claims, computer-based surveys, Electronic Health Record (EHR) and many other sources. EHR improved the access of patient data, which are gathered from hospitals, clinics and other health service providers (Tekieh & Raahemi, 2015). Ledger technology of Blockchain helps the healthcare researcher to facilitate secure movement of patient medical logs, handling the drug supply chain, and accelerating the safe transfer of patient medical logs (Haleem, Javaid, Singh, Suman, & Rab, 2021). Seeing the sensitivity of the medical domain, it is mandatory for computer scientists to find the right insights from the healthcare dataset. Normally healthcare (Kumar Rai, Sharma, Kumar, & Goyal, 2021; Rai & Srivastava, 2014, 2016, 2017) related datasets contain categorical attributes such as gender, age range, symptoms etc. The purpose of this paper is to locate clusters and predict Dental and Lungs Cancer disease. Various analyses are on-going to find the inherent structure or inherent patterns that exist in these datasets such as classification of diseases (P. Kumar & Thakur, 2019, 2021; Harshita Patel, Rajput, Stan, & Miclea, 2022; D. S. Rajput et al., 2021; Reddy et al., 2020), clustering, prediction, logistic regression etc. Cluster analysis (H Patel & Rajput, 2011; D. S. Rajput, 2019) is an important technique to find the inherent structure present in the datasets. Since a large quantity of categorical attributes is present in healthcare or medical datasets and to cluster categorical dataset, among other clustering algorithms k-modes algorithm is straightforward as well as fast. However, when it comes to the optimal solutions there is a need to hybridize these algorithms with optimized clustering algorithms to converge local solutions to global solutions. It means the sensitivity to the initial values is one of the challenges for clustering approaches to generate the sub optimal solutions because those algorithms are similar to hill climbing approaches, as the hill climber shifts in one single path without checking a wider search. Inaccurate choice of the initial cluster seeds usually produces detrimental clustering results. In this paper HS based Categorical data Clustering algorithm (HSCC) is proposed which is hybridization of Harmony Search (HS) (Dubey, Kumar, Kaur, & Dao, 2021) with k-modes to reduce the problem of biased results.

Harmony search (HS) is a population supported optimization algorithm which mimics the musician's behavior (V. Kumar, Chhabra, & Kumar, 2012). It is firstly motivated by the improvisation of Jazz musicians (Moh’d Alia, Al-Betar, Mandava, & Khader, 2011). In HS, decision variables are treated as musicians (Peraza, Valdez, & Castillo, 2015). As a musician plays notes, a decision variable generates values (Shi, Han, & Si, 2013). Through the notes, the musician creates best harmony in the same way decision variables generate values for global optimum solution. The similarity between musicians and HS shown in Figure 1.

Figure 1.

Harmony search and musicians’ behavior

IJEHMC.309440.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 12: 6 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing