Secure Data Analysis in Clusters (Iris Database)

Secure Data Analysis in Clusters (Iris Database)

Raghvendra Kumar (LNCT College, India), Prasant Kumar Pattnaik (KIIT University, India) and Priyanka Pandey (LNCT College, India)
DOI: 10.4018/978-1-5225-2031-3.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter used privacy preservation techniques (Data Modification) to ensure Privacy. Privacy preservation is another important issue. A picture, where number of clients owning their clustered databases (Iris Database) wish to run a data mining algorithm on the union of their databases, without revealing any unnecessary information and requires the privacy of the privileged information. There are numbers of efficient protocols are required for privacy preserving in data mining. This chapter presented various privacy preserving protocols that are used for security in clustered databases. The Xln(X) protocol and the secure sum protocol are used in mutual computing, which can defend privacy efficiently. Its focuses on the data modification techniques, where it has been modified our distributed database and after that sanded that modified data set to the client admin for secure data communication with zero percentage of data leakage and also reduce the communication and computation complexity.
Chapter Preview
Top

Introduction

In recent years, Agarwal et al. (1993) Agarwal, Imielinski, and Swamy (1993) and Srikant and Agarwal (1994) suggested data mining became a very interesting topic for the researcher due to its vast use in modern technology of computer science but due to its vast use it faces some serious challenges regarding data privacy and data privacy became an interesting topic. Many methods techniques and algorithms are already defined and presented for privacy preserving data mining. These privacy preserving techniques can be classified mainly in two approaches; the authors Agrawal and Srikant (2000) and Lindell and Pinkas (2000) suggested Data modification and Secure Multi-party Computation approach. Data Mining suggested by Kantarcioglu and Clifto (2004) in last few decades has become very useful as the database are increasing day by day many people now connected with the computers by Han and Kamber (2006), so it becomes necessary for computer researchers to make the data so fast to access, also need to find right data. The term Data Mining emphasize on the fact of extracting the knowledge from large amount of data, so data mining is the process through which, we collect knowledgeable data from very large data suggested by Sheikh, Kumar, and Mishra (2010).

Now, the database is very large which consists so much information but what we want to find is the relevant data from large database or want to find some patterns which becomes very difficult with normal DBMS but with the use of data mining techniques we can find the hidden patterns and information from large database system. So, we can also term data mining as the knowledge mining, pattern extraction etc. But before applying data mining techniques we need to apply some processes which we known as preprocessing of data. Although data mining is one of the step involved in process of knowledge discovery, still it becomes more popular by name then that (Jangde, Chandel, & Mishra, 2011).

Key Terms in this Chapter

Distributed Database: In modern days, distributed database has become a vital area of information processing. It eradicates many of the short comings of the centralized database and fit more naturally in many organizations that follow decentralized structure. A distributed database is a group of data which logically belongs to the same system but is spread over the sites of a computer network. It may be stored in multiple computers located in the same physical location, or may be dispersed over a network of interconnected computers. A distributed database system consists of loosely coupled sites that share nonphysical components.

Privacy Preserving Association Rule Mining (PPARM): In clustered data is distributed in among site the number of site will be greater than two. And no site is consider as a trusted party all the party have their individual private data and no other party will able to know other party data. Privacy preserving association rule mining in clustered database mainly using three techniques Cryptography technique, Heuristic based technique and Reconstruction based technique.

Privacy Preserving Technique (PPT): Privacy preservation in data mining is an important concept, because when the data is transferred or communicated between different parties then it’s compulsory to provide security to that data so that other parties do not know what data is communicated between original parties. Preserving in data mining means hiding output knowledge of data mining by using several methods when this output data is valuable and private. Mainly two techniques are used for this one is Input privacy in which data is manipulated by using different techniques and other one is the output privacy in which data is altered in order to hide the rules.

Clustering: Division of data into groups of similar objects is called Clustering. Certain fine details are lost by representing the data by fewer clusters but it achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. According to machine learning perspective, clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an important role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others.

Distributed Association Rule Mining (DARM): Distributed association rule mining technique for a vertical partitioned data set across several sites. Let I = {i1, i2, .in} be a set of items and T = {T1, T2… Tn} be a set of transactions where each T? I, i. A transaction Ti contains an item set X?I only if I, X ?T. An association rule associated is of the form X ?Y(X ?Y ? 0) with support S and confidence C if S% of the transactions in T contains X?Y and C% of transactions that contain X also contain Y. In a horizontally partitioned Data base, the transactions are distributed among n sites. Support (X ?Y) = probe (X?Y) /Total number of transaction the global support count of an item set is the union or product of all local support counts. Support g (X) = Support1(x) ?Support2(x) ?…?Support n(x). Confidence (X ?Y) = Support (X ?Y) / Support(X). The global confidence of a rule can be expressed in terms of the global support. Confidence g (X ?Y) = Support g (X ?Y) / Support g(X). The aim of the distributed association rule mining is to discover all rules with global support and global confidence greater than the user specified minimum support and confidence. The subsequent steps, utilizing the secure sum and secure set union methods described earlier are used. The basis of the algorithm is the Apriori algorithm, which use the (k-1) sized frequent item sets to generate the k sized frequent item sets.

Complete Chapter List

Search this Book:
Reset