A Hybrid Clustering Technique to Improve Patient Data Quality

A Hybrid Clustering Technique to Improve Patient Data Quality

Narasimhaiah Gorla (Wayne State University, USA) and Chow Y.K. Bennon (Hong Kong Polytechnic University, Hong Kong)
Copyright: © 2003 |Pages: 21
DOI: 10.4018/978-1-93177-749-0.ch012


The demographic and clinical description of each patient is recorded in the databases of various hospital information systems. The errors in patient data are: wrong data entry, absence of information provided by the patient, improper identity of the patients (in case of tourists in Hong Kong), etc. These data errors will lead to a phenomenon that records of the same patient will be shown as records of different patients. In order to solve this problem, we use “clustering,” a data mining technique, to group “similar” patients together. We used three algorithms: hierarchical clustering, partitioned clustering, and hybrid algorithm combining these two, and applied on the patient data using a C program. We used six attributes of patient data: Sex, DOB, Name, Marital status, District, and Telephone number as the basis for computing similarity, with some weights to the attributes. We found that the Hybrid algorithm gave more accurate grouping compared to the other algorithms, had smaller mean square error, and executed faster. Due to the privacy ordinance, the true data of patients is not shown, but only simulated data is used.

Complete Chapter List

Search this Book: