GPU Scaling: From Personal Supercomputing to the Cloud

GPU Scaling: From Personal Supercomputing to the Cloud

Yaser Jararweh (Jordan University of Science and Technology, Irbid, Jordan), Moath Jarrah (Jordan University of Science and Technology, Irbid, Jordan) and Abdelkader Bousselham (Qatar Environment and Energy Research Institute, Doha, Qatar)
DOI: 10.4018/ijitwe.2014100102
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.
Article Preview

1. Introduction

Universities, research institutions, and industries in different fields of computing face the challenge of meeting the ever-increasing computational demands of today’s applications. A recent trend in high-performance computing is the development and use of architectures and accelerators that combine various degrees of parallelism granularities using thousands of heterogeneous and distinct processing elements (PEs). The demand for these accelerators is primarily driven by different large scale scientific applications such as: climate change modeling, molecular biology, 3D medical imaging, computer gaming, and multimedia, just to name a few. Many high-performance accelerators are available today, such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and the Cell Broadband Engines (IBM Cell BEs). These processing elements are available as accelerators or as many-core processors, which are designed with the objective of achieving higher performance for data-parallel applications. Compared to conventional CPUs, the accelerators can offer order-of-magnitude improvements in performance per dollar and performance per watt (Feng & Manocha, 2007).

Massively parallel GPUs have become common components in most of today’s computing systems. Ranging from personal workstations to high-performance computing clusters and/or cloud based resources. For example, an NVIDIA Tesla C1060 GPU containing 240 Streaming Processor (SP) cores and a memory bandwidth of 102 GB/sec, delivers approximately one TFLOPS peak performance. Emerging general-purpose programming models for GPUs, such as CUDA, OpenACC, and OpenCL, provide a great opportunity for programmers to utilize these massively parallel GPU devices to improve computational performance. This is necessary in order to handle compute-intensive applications and reduce the development cycle costs compared to earlier methods of using graphics programming APIs (Jararweh et al., 2012). Many researchers and developers reported that using GPUs can yield a significant speedup over CPU-only implementations (Stone et al., 2007, 2009), where hundreds of applications reported speedups of 100 times or even more.

Using FPGAs in high performance computing is also a common trend. Many applications showed an order of magnitude improvement in performance through using FPGAs technologies (Jararweh et al., 2012, 2013, Tawalbeh et al., 2010, Moh’d et al., 2011). Although the power consumption of GPU systems is still a major concern and better hardware management is needed (Jararweh et al., 2012, 2011), the emergence of GPU-based computing shifted away the focus from FPGA-based computing due to GPUs superior performance. On the other hand, the Cell processor combines 8 synergistic processors with a 64-bit Power Architecture core which can deliver about an order of magnitude increase in performance compared to regular processors (Kahle, 2005). However, Cell processors were not widely used by the High-Performance Computing (HPC) community due to its associated challenges related to the programming difficulty and high power consumption.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing