Partitioning of Complex Networks for Heterogeneous Computing

Partitioning of Complex Networks for Heterogeneous Computing

DOI: 10.4018/978-1-5225-3799-1.ch004
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The most fundamental problem in BSP parallel graph computing is to decide how to partition and then distribute the graph among the available processors. In this regard, partitioning techniques for BSP heterogeneous computing should produce computing loads with different sizes (unbalanced partitions) in order to exploit processors with different computing capabilities. In this chapter, three major graph partitioning paradigms that are relevant to parallel graph processing are reviewed: balanced graph partitioning, unbalanced graph partitioning, and community detection. Then, the authors discuss how any of these paradigms fits the needs of graph heterogeneous computing where the suitability of partitions to hardware architectures plays a vital role. Finally, the authors discuss how the decomposition of networks in layers through the k-core decomposition provides the means for developing methods to produce unbalanced graph partitions that match multi-core and GPU processing capabilities.
Chapter Preview
Top

Introduction

Processing complex networks has proven to be an important challenge, in that it requires current High Performance Computing (HPC) platforms to support a new set of novel graph computing applications. The HPC industry is utilizing clusters of computers, multi-core CPUs and CPUs combined with application-specific accelerators, such as GPUs, to improve the performance of graph algorithms on large networks.

There is a list of publicly available libraries that exploit homogeneous parallel processing, i.e. the use of a single HPC parallel architecture (e.g. a multi-core server or a HPC cluster integrated of computing nodes with the same characteristics) to power the processing of large networks modeled after real-world phenomena.

The first problem in parallel graph computing is to decide how to partition and then distribute the graph among the available processors.

In homogeneous computing, the balanced graph partitioning problem consists of finding the best way to divide the vertices of a graph into a predefined number of partitions, such that the following conditions are met: (1) the number of vertices among partitions are balanced and (2) the number of edges crossing partitions (edge cut) is minimized (Chamberlain, 1998).

Since the performance of a homogeneous system is determined by the slowest processor, workload imbalance can affect the overall system performance. However, in heterogeneous computing graph partitioning techniques should produce computing loads with different sizes (unbalanced partitions) in order to exploit processors with different computing capabilities, such as multi-cores and GPUs (Shen and Zeng, 2005).

Gharaibeh et al. (2012) and Gharaibeh et al. (2013) show that it is possible to render communication overhead negligible in kernels like BFS and PageRank by aggregating communications in a CPU + GPU heterogeneous computing context. It is also suggested that the graph partitioner should focus on producing partitions that minimize computing time rather than communications. What kind of partitions should new graph partitioning techniques produce to fit the capabilities of different parallel architectures, reduce the computation time of partitions, and benefit from the heterogeneous computing approach?

In this chapter, three paradigms of complex network partitioning are reviewed: the balanced complex network partitioning where the number of partitions is known, the unbalanced partitioning of complex networks where the number of partitions is known, and the unbalanced partitioning of complex networks where the number of partitions is unknown (community detection).

Top

Partitioning And Mapping For Parallel Computing

The main goal of graph parallel computing, and of any parallel application in general, is to obtain a high-performance algorithm that solve a problem faster while keeping programming effort and resource requirements of the program low. To achieve this we must care about several factors related to both parallel software and parallel infrastructure, such as the partitioning and the distribution of the work among available processors, as well as the impact of the communications and synchronizations in the performance of the parallel application.

In the first steps, we have to split the computation into a collection of tasks in order to expose enough concurrency to keep processes busy most of the time, but no so much concurrency to avoid that the overhead of task management exceeds benefits of decomposition. In the last steps, we determine mechanisms to assign tasks to processes. Goals are: balance the workload (computation, I/O, communication) among processes, reduce interprocess communication, and reduce overhead due to assignment operations.

Complete Chapter List

Search this Book:
Reset