Hadoop-Based Distributed K-Shell Decomposition for Social Networks

Hadoop-Based Distributed K-Shell Decomposition for Social Networks

Katerina Pechlivanidou (University of Thessaly, Greece), Dimitrios Katsaros (University of Thessaly, Greece) and Leandros Tassiulas (Yale University, USA)
DOI: 10.4018/978-1-5225-2814-2.ch008


Complex network analysis comprises a popular set of tools for the analysis of online social networks. Among these techniques, k-shell decomposition of a network is a technique that has been used for centrality analysis, for communities' discovery, for the detection of influential spreaders, and so on. The huge volume of input graphs and the environments where the algorithm needs to run, i.e., large data centers, makes none of the existing algorithms appropriate for the decomposition of graphs into shells. In this article, we develop for a distributed algorithm based on MapReduce for the k-shell decomposition of a graph. We furthermore, provide an implementation and assessment of the algorithm using real social network datasets. We analyze the tradeoffs and speedup of the proposed algorithm and conclude for its virtues and shortcomings.
Chapter Preview

1. Introduction

The enormous progress in information technology and the seamless connectivity combined with the high availability of data sources but also some great improvements in hardware have created an ideal background for developing Online Social Networks (OSNs). Facebook, Twitter and LinkedIn are only some examples of modern online social networks that handle today huge amount of information, in means of both storing and processing. The data mostly describe pairwise interactions and are thus constructing networks.

Since their first appearance in data science, particular emphasis has been given to the analysis and mining of the above networks since they can offer both operational information and business intelligence to the OSN owner. Social Network Analysis, as it is known academically, uses a set of tools coupled with graph theory principles to study OSN’s pattern and underlying structures. Due to its importance, it has found a large variety of applications that include methods for centrality discovery of the nodes of a network, estimation of its robustness and finding the hierarchy core structure.

The latter concept is known as the k-core or k-shell decomposition of a graph (Seideman, 1983) and is particularly appealing as it highlights its underlying core structure and reveals the central nodes even in really complex scenarios. The process of retrieving the k-cores of a graph is a highly iterative procedure including node and edge pruning rounds. More specifically, if from a given graph we remove recursively all nodes and the incident lines with them of degree less that k, then the remaining graph is called a k-core. The procedure is fairly simple; yet the k-core decomposition has been used in social network analysis as a centrality measure for various purposes like the discovery of communities (Aksu et al., 2013), for detection of influential spreaders (Basaras et al., 2013) and analysis of the Internet structures (Carmi et al., 2007).

The great scientific interest in the k-shell decomposition has resulted a set of algorithms for its computation in diverse computational environments and for different types of networks. For example, there are versions of the algorithm for weighted (Garas et al., 2012) and unweighted graphs, for static or nearly static networks that are slowly acquired in a streaming fashion (Sariyuce et al., 2013). There are also versions of the k-shell decomposition method suitable to run on a single machine, either by exploiting its main memory (Batagelj & Zaversnik, 2002) or by running on a secondary storage (Cheng et al., 2011), or on a cluster of machines (Montesor et al., 2012). Of all the above categories of networks, the static graphs or those whose topology is slightly changing over time compared to the time required to analyze it, are encountered in most of the modern online social networks. We therefore focus on static and unweighted networks in our proposed k-shell decomposition method.

Complete Chapter List

Search this Book: