Clustering Methods for Detecting Communities in Networks

Clustering Methods for Detecting Communities in Networks

Ademir Cristiano Gabardo (Universidade Tecnológica Federal do Paraná - UTFPR, Brazil) and Heitor S. Lopes (Universidade Tecnológica Federal do Paraná - UTFPR, Brazil)
Copyright: © 2015 |Pages: 10
DOI: 10.4018/978-1-4666-5888-2.ch344
OnDemand PDF Download:
$30.00
List Price: $37.50

Chapter Preview

Top

Introduction

Real-world networks, such as social networks, enterprise relationships, and the Internet itself, present large amounts of data that can be represented as networks and organized according to some criteria. Such criteria can be, for instance, a measure of similarity, connectivity or a physical distance. In the last years, many efforts have been spent in graph clustering, so as to develop and apply efficient computational methods to group massive data and find communities in networks (Frank, 1996).

As an example we can address social networks. Social networks are groups of individuals or entities that share one or more types of relationships, these relationships can be of various types, common interests, degrees of kinship, shared services, etc. With the popularization of the Internet, there is an increasing number of connected devices, and even more people and organizations are sharing information. Consequently, social networks are becoming ubiquitous (Kumar, Novak, & Tomkins, 2010). Popular social networks such Twitter, Facebook, Google, etc. are widely known by the general public. There is also a large amount of other social related data among the Internet and other networks forming implicit social networks. For instance, citation networks, e-mail traffic, phone users, coworkers, classmates, etc.

Social networks can reveal several aspects of the social behavior of their users, providing relevant information about relationships, identification of influential groups, spread of information, political behavior or even epidemic diseases. The analysis of complex networks has arisen in many areas, such as sociology, communications, computer science, physics and biology.

In this sense, it is relevant to identify clusters, structural communities where a large number of edges join vertices as a cohesive group, a strongly related group of members which can be described as an independent portion of the network or a subgraph.

Usually, methods for detecting communities in large networks are computationally intensive, demanding high processing power. To achieve good clustering results, efficient methods to discover communities in complex networks are needed.

There are several approaches to group the subjects in complex networks. e.g.: Graph Degree Linkage (Zhang et al., 2012), Hierarchical Clustering Algorithms (Murtagh, 1983), Nearest Neighbor Clustering (Ertoz et al., 2002), Partition Algorithms (Fortunato, 2010), etc. Some clustering methods are mathematically formulated to evaluate the connections between vertices of a graph, instead of being focused on similarity measures. The choice of methods depends on which kind of information the social network analyst is pursuing.

An example is the Girvan and Newman (2002) approach for community detection that focuses on betweenness, by removing edges with largest centrality (Freeman, 1977). Another example comprises the Modularity Optimization Methods (Newman, 2006) that uses node degree (how many connections relate to a vertex) as part of the procedure to detect communities.

K-means and its variants, on the other hand, is focused on vertex characteristics. It is more related to data-mining than to community detection, but still can be a powerful tool to group clusters of individuals with high similarity.

This article presents some of the main properties of social networks and complex networks, how the communities and clusters are characterized and the ways used to identify clusters in networks using the K-means algorithm and its main variants, the fuzzy c-means and the weighted K-means.

This article is intended for Social Networks analysts, students and researchers in the field of data mining, and for those seeking for agile methods of data arrays in complex networks. Although the K-means algorithm and its variants do not cover the completeness of community detection in complex networks it can be a powerful tool for discovering groups with high similarity. We also show an extended version that weights the data dimensions to be grouped.

Key Terms in this Chapter

Social Networks: A social structure such individuals or organizations and the relationship between them.

Fuzzy: An approach based on “intermediate degrees” rather than the “true or false” Boolean logic.

Complex Networks: Graphs with nontrivial features.

Graph: A graph is represented as a set of points (vertices) connected by lines (edges).

K-Means: A clustering algorithm to partition n observations into K clusters, which each K cluster has a centroid and each observation is assigned to a cluster.

Data Mining: The process of exploring large amounts of data in search of consistent patterns.

Cluster: A group of equal or similar elements that occurring closely or together. A sub network divided in groups of nodes with dense connections internally and sparser connections between groups.

Outdegree: The number of the edges directed out a vertex in a directed graph.

Indegree: The number of the edges directed into a vertex in a directed graph.

Community: A group or set of individuals with similarities into a network.

Centroid: The point that inside a geometric shape defines its geometric center.

Complete Chapter List

Search this Book:
Reset