Topology-Aware Load-Balance Schemes for Heterogeneous Graph Processing

Topology-Aware Load-Balance Schemes for Heterogeneous Graph Processing

DOI: 10.4018/978-1-5225-3799-1.ch005
OnDemand PDF Download:
No Current Special Offers


Inspired by the insights presented in Chapters 2, 3, and 4, in this chapter the authors present the KCMAX (K-Core MAX) and the KCML (K-Core Multi-Level) frameworks: novel k-core-based graph partitioning approaches that produce unbalanced partitions of complex networks that are suitable for heterogeneous parallel processing. Then they use KCMAX and KCML to explore the configuration space for accelerating BFSs on large complex networks in the context of TOTEM, a BSP heterogeneous GPU + CPU HPC platform. They study the feasibility of the heterogeneous computing approach by systematically studying different graph partitioning strategies, including the KCMAX and KCML algorithms, while processing synthetic and real-world complex networks.
Chapter Preview


A major strategy for graph heterogeneous computing processing is that of heterogeneous partitioning in which computation proceeds in parallel on two or more parallel architectures (such as CPUs and GPUs) at the same time (Nilakant and Yoneki, 2014). Thus, a fundamental problem in heterogeneous partitioning computing is to decide how to partition and then distribute the graph among the available processors.

Most conventional graph partitioning algorithms produce equivalent partitions and focus to reduce communication costs by minimizing the edge cut. However, in heterogeneous computing, the computing power of different processors varies, so that the size of the tasks to be scheduled should not be the same. What kind of partitions should new graph partitioning techniques produce to fit the capabilities of different parallel architectures, reduce the computation time of partitions, and benefit from the heterogeneous computing approach?

According to Gharaibeh et al. (2013) and Gharaibeh and Elizeu (2013), a graph partitioning strategy for heterogeneous computing should have: (1) low space and time complexity, (2) the ability to handle scale-free graphs, (3) the ability to handle large graphs, and (4) focus on the reduction of computation time. Additionally, a heterogeneous partitioning approach must find a trade-off between the overhead of partitioning the input and synchronizing the processors, and the gain in overall computation throughput (Nilakant and Yoneki, 2014). Having fast (linear-time) algorithms for partitioning graphs is both useful and relevant.

Gharaibeh et al. (2013) and Gharaibeh and Elizeu (2013) also show that it is possible to render communication overhead negligible in kernels like BFS and PageRank, and it is suggested that the graph partitioner should focus on producing partitions that minimize computing time rather than communication time. How to produce graph partitions that fit the capabilities of different parallel architectures, reduce the computation time of partitions, and benefit from the heterogeneous computing approach?

In this chapter, the graph partitioning problem for heterogeneous computing is re-stated as subgraph search problems to fit graph partitions to parallel architectures in an efficient manner. The objective is to present the synergy between Network Science and HPC data by studying techniques for graph partitioning based on the coreness of complex networks for load distribution on heterogeneous computing platforms.

Specifically, unbalanced graph partitioning algorithms are proposed to bisect complex networks by:

  • 1.

    Finding the densest subgraph through the recursive removal of the core of complex networks, as defined by the 978-1-5225-3799-1.ch005.m01-core decomposition.

  • 2.

    Finding unbalanced partitions by calculating central subgraphs of coarse graphs produced by considering both the 978-1-5225-3799-1.ch005.m02-core decomposition of the original graphs and the centrality of coarse subgraphs.

To study the benefits of the proposed subgraph search approach, the proposed partition algorithms are integrated into a BSP-based heterogeneous computing platform. An experimental performance analysis is then conducted by traversing large real-world and synthetic complex networks on CPU + GPU.


As discussed in the previous chapter, there are works in literature that combine the capacities of heterogeneous processors for accelerating a variety of graph computing applications. Common processors combinations are CPU + GPU (Gharaibeh et al., 2012; Gharaibeh et al., 2013; Hong et al., 2011; Zou et al., 2013; Banerjee et al., 2013; Munguia et al., 2012), CPU + APU’s (Nilakant and Yoneki, 2014), and CPU + MIC’s (Gao et al., 2014). We describe these works in the following paragraphs.

Complete Chapter List

Search this Book: