Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned

Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned

Shivlal Mewada
Copyright: © 2021 |Pages: 17
DOI: 10.4018/IJEHMC.20210901.oa4
Article PDF Download
Open access articles are freely available for download

Abstract

The valuable information is extracted through data mining techniques. Recently, privacy preserving data mining techniques are widely adopted for securing and protecting the information and data. These techniques convert the original dataset into protected dataset through swapping, modification, and deletion functions. This technique works in two steps. In the first step, cloud computing considers a service platform to determine the optimum horizontal partitioning in given data. In this work, K-Means++ algorithm is implemented to determine the horizontal partitioning on the cloud platform without disclosing the cluster centers information. The second steps contain data protection and recover phases. In the second step, noise is incorporated in the database to maintain the privacy and semantic of the data. Moreover, the seed function is used for protecting the original databases. The effectiveness of the proposed technique is evaluated using several benchmark medical datasets. The results are evaluated using encryption time, execution time, accuracy, and f-measure parameters.
Article Preview
Top

1. Introduction

In present time, lot of information are gathered in business houses, institutes, and government official, and the information is produced in exponential manner (Afzali & Mohammadi, 2017; Li, Lu, Choo et al, 2016). Several data mining tools are available to process the collected information and determine valuable pattern for decision making. It is seen that data mining tools also explore the various hidden information associated with individuals such as sensitive, private and confidential. Hence, several privacy preserving data mining methods are developed in literature for handling aforementioned issues (Chamikara et al., 2018; Lin et al., 2016; Mehta & Rao, 2017). These methods secure the database through perturbation technique. It is seen that PPDM methods ensure the privacy of information through converting the database. In literature, it is reported that PPDM methods provide more privacy at the mining stage and privacy can be handled during the preprocessing and postprocessing operations (Komishani et al., 2016; Upadhyay et al., 2018; Yun & Kim, 2015). It also focuses on the misuse of the sensitive of a person and organization. So, the data is modified in such a manner that intruder cannot give any comment regarding the sensitive information. In turn, the sensitive information must be protected (Dong & Pi, 2018; Qi & Zong, 2012). Other side, distributed data mining techniques become popular to extract the information in distributed resources. Several models have been designed on the concept of distributed data mining i.e. the features of collaboration of multiple parties can be used to design an effective model for avoiding the leakage of information. The distributed data mining can be either horizontal partitioned or vertical partitioned (Oliveira & Zaïane, 2007). It is interpreted as the data can be partitioned either horizontal manner or vertical manner. Figure 1 illustrates the concept of horizontal partitioning and vertical partitioning. In many applications, the sensitive information is shared among multiple parties, but privacy law stated that the information can be shared with restricted context in distributed scenarios (Mariscal et al., 2010; Matatov et al., 2010). To address the same, several distributed data mining algorithms have been developed to exchange the sensitive information among multiple parties ensuring the privacy of data (Sun et al., 2014; Zhou et al., 2015). In present time, cloud computing also attracts the attention of researchers as platform for building the distributed databases and software for third party (Ahuja et al., 2012; Grobauer et al., 2010). It is due to mobility, availability and lower cost of cloud computing. Furthermore, cloud computing can be described as clustering of multiple server design to handle the task in remote manner. Whereas, data mining can be responsible to extract the structured and consist information from unstructured and semi structured data sources. It is also noticed that distributed mining algorithms require high bandwidth of networks and having capabilities of cooperation between multiple parties (Ambulkar & Borkar, 2012). It is seen that distributed data mining algorithm having capability to handle the distributed resources in efficient manner in terms of mining and manage the data. In literature, several techniques are also reported for hiding the sensitive information especially for binary datasets (Ferrag et al., 2018; Modi et al., 2010). These techniques are the combination frequent item sets and association rules and motive of these technique is to determine the semantic information among either attributes or transactions of databases (Matatov et al., 2010). Further, to maintain the confidence and secure the sensitive information during sanitization process, these techniques can remove some of transactions and itemset from the databases. Hence, data mining and machine learning techniques have also been adopted for designing several secure protocols (Li, Yang, & Ji, 2016). Few of these are Decision tree, Bayesian networks, clustering, association rule mining and neural networks (). The objective of these techniques is to ensure the privacy of sensitive information among multiple parties during the extraction of valuable information form the datasets (Prakash & Singaravel, 2015; Sui & Li, 2017). The identification of frequent itemset and association rules are one of major issues associated with data mining-based protocol (Bhuyan & Kamila, 2015; Liu et al., 2008). It is also observed that most of data mining (DM) techniques transform the dataset in such a manner that data is not easily interpretable during the implementation of DM algorithm (González-Serrano et al., 2017; Jiang et al., 2018).

Figure 1.

Illustrates the horizontal and vertical partitioning process

IJEHMC.20210901.oa4.f01

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 5 Issues (2022): 4 Released, 1 Forthcoming
Volume 12: 6 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing