Constructing the Routing Table Automatically for NUMA System based on Uncore Disturbance Pattern

Constructing the Routing Table Automatically for NUMA System based on Uncore Disturbance Pattern

Mei Wang (Shenzhen Polytechnic, Shenzhen, China), Jie Liu (Shenzhen University, Shenzhen, China) and Qiuming Luo (Shenzhen University, Shenzhen, China)
DOI: 10.4018/IJITN.2016040109
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

NUMA system depends heavily on software to deploy underlying hardware architecture to obtain the optimized performance. The routing table of NUMA nodes is a critical architecture information for optimization. The routing table is not always available and dynamic routing for broke links make it more complex. But there is no work addressing automatically generating the routing table without prior knowledge of inter-processor connection. The authors proposed an algorithm based on detecting the uncore disturbance by memory bandwidth or PMU events counting variance. The disturbance pattern vectors are obtained by running a dedicate benchmark. Then calculate the routing table begin with direct connected nodes and then to the longer distance routing path step by step. The experiment shows that the authors' algorithm work effective for some real NUMA systems and synthesized topologies and generate the correct routing tables.
Article Preview

1. Introduction

As coming the Big-Data and In-Memory computing era, NUMA (Non Uniform Memory Access) has become the prevalent and important architecture of hardware platforms, which meet the increasing requirements for memory bandwidth of many-core architectures.

As the number of PC server’s core increasing, the memory contention of data intensive applications becomes more and more serious. Instead of using faster and bigger processor caches, NUMA have solved the memory bandwidth issues to some extent by using asymmetric hierarchical memory model, where the memory controllers are distributed while maintains a global address space for all the memory. Accessing the remote memory needs inter-processor connection technology, which means a remote accessing will have a higher cost than local accessing and give rise to the bandwidth contention along the routing path to remote node.

Because of the distributed and asymmetric nature of NUMA, the performance depends heavily on the software to deploy the characteristics of underlying hardware to obtain the optimized performance. As concerning the virtualized environments of cloud computing, authors in (Rao et al., 2010) proposed a method called vNUMA-mgr to optimize the VMs (Virtual Machines) deployment on NUMA architecture and their experimental results showed that High-Performance Computing (a memory intensive application) have achieved 30%~50% performance gains. Research in (Ali, Q, 2012) also showed that the ESXi Server of VMware have improved performance up to 167% after adopting virtual NUMA topology technology.

The previous works pay much attention to the data locality and were well documented (Zhang et al., 1991; LaRowe et al., 1992; Brecht, 1993; Holliday et al., 1994; Bircsak et al., 2000) and others are focused on how to map the thread and memory into particular NUMA architecture and to maximize the locality (Osiakwan et al., 1990; Castro et al., 2009; da Cruz et al., 2012; Tudor et al., 2011) using OS-provided APIs or other tools (Drepper, 2007; Kleen, 2005; Ribeiro et al., 2009; Lameter, 2006; Hursey et al., 2011). But in recent two years, the studies (Awasthi et al., 2010; Majo et al., 2011; Luo et al., 2013; Dashti et al., 2013) have shown that microarchitecture have a great effect on optimizing the memory performance on NUMA platforms, even under some circumstance decreasing data locality may procure better performance.

All these works needs to know the topology of NUMA nodes’ connection and assume that the topology is provided. But it is not always true for all circumstances. And what is more, the software should not be optimized for a fixed topology of a certain architecture or hardware platform. It means we should obtain the ability to detect the topology and construct the routing table automatically, when software is transplanted to a new platform. As we know there is no article addressing this problem, and there is no existed method to solve this problem.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing