Abstract
Clustering-based approaches, also known as generalization approaches, are popular in anonymizing relational data. In social networks, these generalization approaches are applied to vertices and edges of the graph. The vertices and edges are grouped into partitions called super-vertices and super-edges, respectively. In most cases, these vertices and edges are divided according to some predefined loss function. One major drawback of this approach is that the graph shrinks considerably after anonymization, which makes it undesirable for analyzing local structures. However, the details about individuals are properly hidden and the generalized graph can still be used to study macro-properties of the original graph.
TopIntroduction
For better understanding of Clustering approaches, let’s look at a small network shown in Figure 1. Its generalized version is given in Figure 2. We have considered 4 super-vertices and the vertices included in each super-vertex is clustered in the original graph.
Figure 2. Generalized version of figure
Arrangement of vertices in clusters is done using a predefined loss function. The clusters are usually labelled by their size and the number of intra-cluster edges present in them. The edges between clusters are labelled by the corresponding number of inter-cluster edges present in the network. Note that the intersection of any two clusters is usually null. A traditional definition of good clustering is that nodes assigned to the same cluster should be highly similar and nodes assigned to a different cluster should be highly dissimilar. Number of clusters, minimum size, maximum size, average size all are crucial factors of any clustering algorithm. Various algorithms have been proposed for achieving anonymization through clustering and saNGreeA is the leading algorithm in this field. We shall discuss this algorithm in the section titled saNGreeA Algorithm. Clustering-based approaches are divided into four sub-categories: vertex clustering methods, edge clustering methods, vertex and edge clustering methods, and vertex-attribute mapping clustering methods.
TopVertex Clustering Methods
Vertex clustering method generalizes an input network by grouping all the vertices into partitions and publishing the number of vertices in each partition along with the number of edges within and across every partition.