Clique Size and Centrality Metrics for Analysis of Real-World Network Graphs

Clique Size and Centrality Metrics for Analysis of Real-World Network Graphs

Natarajan Meghanathan (Jackson State University, USA)
DOI: 10.4018/978-1-5225-7598-6.ch087

Abstract

The authors present correlation analysis between the centrality values observed for nodes (a computationally lightweight metric) and the maximal clique size (a computationally hard metric) that each node is part of in complex real-world network graphs. They consider the four common centrality metrics: degree centrality (DegC), eigenvector centrality (EVC), closeness centrality (ClC), and betweenness centrality (BWC). They define the maximal clique size for a node as the size of the largest clique (in terms of the number of constituent nodes) the node is part of. The real-world network graphs studied range from regular random network graphs to scale-free network graphs. The authors observe that the correlation between the centrality value and the maximal clique size for a node increases with increase in the spectral radius ratio for node degree, which is a measure of the variation of the node degree in the network. They observe the degree-based centrality metrics (DegC and EVC) to be relatively better correlated with the maximal clique size compared to the shortest path-based centrality metrics (ClC and BWC).
Chapter Preview
Top

Introduction

Network Science is a fast-growing discipline in academics and industry. It is the science of analyzing and visualizing complex real-world networks using graph theoretic principles. Several metrics are used to analyze the characteristics of the real-world network graphs; among them “centrality” is a commonly used metric. The centrality of a node is a measure of the topological importance of the node with respect to the other nodes in the network (Newman, 2010). It is purely a link-statistics based measure and not based on any offline information (such as reputation of the node, cost of the node, etc). The commonly used centrality metrics are degree centrality, eigenvector centrality, closeness centrality and betweenness centrality. Degree centrality (DegC) of a node is simply the number of immediate neighbors for the node in the network. The eigenvector centrality (EVC) of a node is a measure of the degree of the node as well as the degree of its neighbor nodes. We refer to DegC and EVC as degree-based centrality metrics. Closeness centrality (ClC) of a node is the inverse of the sum of the shortest path distances of the node to every other node in the network. Betweenness centrality (BWC) of a node is the ratio of the number of shortest paths the node is part of for any source-destination node pair in the network, summed over all possible source-destination pairs that do not involve the particular node. We refer to ClC and BWC as shortest path-based centrality metrics. Computationally efficient polynomial-time algorithms have been proposed in the literature (Brandes, 2001; Strang, 2005; Cormen et. al., 2009; Newman, 2010) to determine exact values for each of the above centrality metrics; hence we categorize centrality as a computationally lightweight metric.

A “clique” is a complete sub graph of a graph (i.e., all the nodes that are part of the sub graph are directly connected to each other). Cliques are used as the basis to identify closely-knit communities in a network as part of studies on homophily and diffusion. Unfortunately, the problem of finding the maximum-sized clique in a graph is an NP-hard problem (Cormen et. al., 2009), prompting several exact algorithms and heuristics to be proposed in the literature (Pattabiraman et. al., 2013; Fortunato, 2010; Palla et. al., 2005; Sadi et. al., 2010; Tomita & Seki, 2003). In this chapter, we choose a recently proposed exact algorithm (Pattabiraman et. al., 2013) to determine the size of the maximum clique for large-scale complex network graphs and extend it to determine the size of the maximal clique that a particular node is part of. We define the maximal clique size for a node as the size of the largest clique (in terms of the number of constituent nodes) the node is part of. Note that the maximal clique for a node need not be the maximum clique for the entire network graph; but, the maximum clique for the entire graph could be the maximal clique for one or more nodes in the network.

Since the maximal clique size problem is a computationally hard problem and exact algorithms run significantly slower on large network graphs, our goal in this chapter is to explore whether the maximal clique size correlates well to one of the commonly studied computationally lightweight metrics, viz., centrality of the vertices, for complex real-world network graphs: if we observe a high positive correlation between maximal clique size and one or more centrality metrics, we could then infer the ranking of the vertices based on the centrality values as the ranking of the vertices based on the maximal clique size in real-world network graphs. Ours will be the first chapter to conduct a correlation study between centrality and maximal clique size for real-world network graphs. To the best of our knowledge, we did not come across such a work that has done correlation study between these two metrics (and in general, a computationally hard metric vis-a-vis a computationally lightweight metric) for real-world network graphs. Throughout the chapter, we use the terms 'node' and 'vertex' as well as 'link' and 'edge' interchangeably. They mean the same.

Complete Chapter List

Search this Book:
Reset