A Survey on Overlapping Communities in Large-Scale Social Networks

A Survey on Overlapping Communities in Large-Scale Social Networks

S Rao Chintalapudi (JNTUK, India) and H. M. Krishna Prasad M (JNTUK, India)
Copyright: © 2018 |Pages: 18
DOI: 10.4018/978-1-5225-2805-0.ch008


Social network analysis is one of the emerging research areas in the modern world. Social networks can be adapted to all the sectors by using graph theory concepts such as transportation networks, collaboration networks, and biological networks and so on. The most important property of social networks is community, collection of nodes with dense connections inside and sparse connections at outside. Community detection is similar to clustering analysis and has many applications in the real-time world such as recommendation systems, target marketing and so on. Community detection algorithms are broadly classified into two categories. One is disjoint community detection algorithms and the other is overlapping community detection algorithms. This chapter reviews overlapping community detection algorithms with their strengths and limitations. To evaluate these algorithms, a popular synthetic network generator, i.e., LFR benchmark generator and the new extended quality measures are discussed in detail.
Chapter Preview

1. Introduction

With the advent of internet, websites and social networks generate large amount of data. Social Network Analysis (Aggarwal 2014) (Reza et al., 2014) is a traditional research area that faces the challenge of big data in the recent years. As social networks are getting more popular, analyzing such networks has become one of the most important issues in various areas. In the era of big data, the amount of available data is growing unprecedentedly. Thus, data analysis techniques need very scalable approaches that can cope with huge network datasets. The significance of big data has attracted the concern of governments, companies and scientific institutions. The voluminous data available in social network sites and web like Facebook, Twitter, Instagram, Linkedin, Weibo, World Wide Web and Wikipedia can be treated as bi data. This data can be represented as a graph/ Network, where nodes denote persons or pages, while edges represent the relationship between persons or pages. This relationship represents following in Twitter, friendship in Facebook, professional connections in LinkedIn, hyperlinks in WWW. This data may constitute several communities based interactions between people or entities. The members of community have some common interests such as movies, travel, photography, music, novels … etc and hence, they tend to interact more frequently within the community than the outside. Finding Communities in social networks is similar to Graph clustering problem, hence, it can also be called as Big graph clustering with respect to larger networks.

The basic objective of the chapter is to answer the following three questions in detail.

  • 1.

    What is community detection?

  • 2.

    How can we detect overlapping communities?

  • 3.

    How can we evaluate detected overlapping communities?

This chapter is organized as follows. Section 2 discusses about the preliminaries of community detection. The several popular overlapping community detection algorithms with their strengths, limitations are discussed in section 3. In section 4, the synthetic networks and real world networks with their quality measures such as Overlapping Normalized Mutual Information (ONMI), Omega index, F-Score, Overlap Modularity that can be used for evaluating overlapping community detection algorithms were discussed. Finally, conclusions and the future research directions were discussed in section 5.


2. Preliminaries

Given an undirected and unweighted graph G(V,E) representing a social network, where V ={v1,v2,…,vn} is the non-empty set of vertices representing actors and E={(i,j)|vi,vjV is the set of edges representing relationships among them. The cardinality of vertex set (|V|) and edge set (|E|) are n, m respectively. Let vi and vj be two vertices from V, if e(vi,vj) belongs to E then vi and vj are neighbors. The objective of the community detection problem is to partition the vertex set into C disjoint subsets in a way that place densely connected nodes in the same community. Here C can be given as input parameter or determined by the algorithm itself.

Community detection is similar to clustering in the context of social networks. It is one of the most fundamental problems in social network analysis.

Figure 1.

Real world network before social network analysis

Figure 2.

Real world network after social network analysis

Complete Chapter List

Search this Book: