Social Network Anonymization

Social Network Anonymization

DOI: 10.4018/978-1-5225-5158-4.ch002

Abstract

Due to technological advances, it has become easy to collect electronic records from a social network for an adversary. However, the organisations which collect data from social networks have two options before them: either they can publish the data and bear the undesirable consequence of privacy threats or not make it public by avoiding further analysis of these data by social scientists to uncover useful facts, which can be of high importance for the society. Since both these options are undesirable, one can try to find an intermediate way between the two, that is, the data before publishing can be anonymised such that even if an adversary gets some information from the published network, he/she cannot decipher and obtain sensitive information about any individual. By anonymization, the authors mean the perturbation of the real data in order to make it undecipherable. This chapter explores social network anonymization.
Chapter Preview
Top

Social Network Data Anonymization Factors

Privacy preservation of data in social networks is far more challenging than that in relational data. Most of the research done for privacy preservation in data is applicable to relational data only. Since structural relationships of actors are prevalent in social networks, the relational data algorithms cannot be applied to social networks without substantial modifications. Of course, if we collect the information at the nodes of a social network we get a relational database, the tuples being the information about the nodes. Even if we apply anonymisation algorithms to this database, although the characteristic values of the nodes will not be identified, an intruder always has a chance of identifying a node from its structural properties; like their different centralities and degrees. The anonymisation procedures for the relational databases, if used for social networks, would lead to partial anonymisation only. Link anonymisation techniques are thus to be developed. The combination of these two may lead to some anonymisation procedures for social networks. Thus anonymizing social network data is much more challenging than anonymizing relational data.

One of the most important factors which need to be handled is the background knowledge (information that is essential to understanding a situation or problem) of the adversaries. Many pieces of information can be used to identify individuals, such as labels of vertices and edges, neighbourhood graphs, induced subgraphs and their combinations.

The other important factor is the measuring of information loss in anonymizing social network. It is hard to compare two social networks by comparing the vertices and edges individually. The connections come into play. A set of vertices and a set of edges can be connected to form a network in different manners. The connectivity, the Betweenness and diameter can be different. So, measuring the information loss due to anonymisation is not an easy job. Addition or deletion of edges and vertices leads to information loss or addition of redundant and unnecessary information.

Three aspects are very much essential to be remembered while developing protection techniques against privacy attacks. These are; identification of privacy information which are to be protected, modelling of background knowledge of an adversary which can be utilized while attacking the privacy of an individual actor and the published network after anonymization should retain most of its utility.

Complete Chapter List

Search this Book:
Reset