Data Mining of Personal Information: A Taste of the Intrusion Legacy with a Sprinkling of Semantic Web

Data Mining of Personal Information: A Taste of the Intrusion Legacy with a Sprinkling of Semantic Web

Dionysios Politis
Copyright: © 2009 |Pages: 16
DOI: 10.4018/978-1-60566-204-6.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this chapter data-mining techniques are presented that can be used to create data-profiles of individuals from anonymous data that can be found freely and abundantly in open environments, such as the Internet. Although such information takes in most cases the form of an approximation and not of a factual and solid representation of concrete personal data, nevertheless it takes advantage of the vast increase in the amount of data recorded by database management systems as well as by a number of archiving applications and repositories of multimedia files.
Chapter Preview
Top

Introduction

It is a common secret that personal data that should be handled cautiously are “leaking” intentionally or unintentionally due to “mistakes”, mismatches in Information Systems handshaking, or hacking. The last threat is the most difficult to cope with since the guardians in the inner circle have to supervise not only the hierarchical administrative structure that maintains and accesses the data, but also to watch carefully the Information and Communication Technologies advances that provide alternative routes to sensitive data for a variable number of support personnel.

However, technological innovations and multimedia gadgets of various forms have shaped another way to form personal data repositories. Indeed, in recent years there has been a vast increase in the amount of data recorded by database management systems as well as by a number of archiving applications. This explosion in the amount of electronically stored data was accelerated by the success of the relational model for storing data and the development and maturing of data retrieval and manipulation technologies. Apart from the large corporate databases which have been implemented, new forms of spreading information and storing data have emerged; the most notable of them are the World Wide Web (WWW) and the various multimedia databases formed out of diverse multimedia applications and presentations. The conjunction of hypertext languages and multimedia data such as images, video and audio, shape an immense network of information. It could be said that the WWW virtually is the largest database ever built.

While technology for storing the data has developed fast to keep up with the demand, little emphasis was paid to developing software for analyzing the data. It can be readily shown in Figure 1 that the conventional Data Manipulating Language (DML) interfaces are insufficient or cumbersome in extracting statistical information.

Figure 1.

978-1-60566-204-6.ch014.f01

Until recently, it was difficult not only to process but even to correlate information merging from diverse fields of data warehouses. The huge amounts of stored or tracked data contain knowledge that can be deduced, covering many aspects of the activities recorded as “raw” data. Database Management Systems in use at present manage these data sets allowing the user to access only information explicitly present in the databases.

The actual data stored in a database is only a fraction of the knowledge base that can be extracted from it. The term knowledge practically is defined as some kind of declarative language in the form of rules. This extraction of knowledge from large data sets is called Data Mining or Knowledge Discovery in Databases and is defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data (Frawley et al, 1991). However, the exhaustively mined data set does not contain only implicit information about a number of aspects of the underlying database. It also contains latent data links that can be harnessed and recorded out of free and full text retrieval data sets. The obvious benefits of Data Mining have resulted in a lot of resources being directed towards its development.

Top

Data Mining Techniques And Practices

Large databases are searched for relationships, patterns, and trends, which prior to the search were not known to exist or were not visible. Sometimes these relationships might be assumed by knowledge engineers but need to be proven or refined. The result of data mining is new information or knowledge that will allow the user community to ameliorate its performance.

The profound difficulty with data mining is that very large databases need to be processed for what is often just a few related facts. The search criteria, once used to gain insight into some particular pattern or trend, will tend to be modified before the next execution, and the data that is being examined tends to cover years of details or terabytes of storage.

Complete Chapter List

Search this Book:
Reset