Toward a Proof of Concept Implementation of a Cloud Infrastructure on the Blue Gene/Q

Toward a Proof of Concept Implementation of a Cloud Infrastructure on the Blue Gene/Q

Patrick Dreher, William Scullin, Mladen Vouk
Copyright: © 2015 |Pages: 10
DOI: 10.4018/ijghpc.2015010103
(Individual Articles)
No Current Special Offers


Conventional cloud computing architectures may seriously constrain computational throughput for HPC applications and data analysis. The traditional approach to circumvent such problems has been to map such applications onto specialized hardware and co-processor architectures. These shortcomings have given rise to calls for software that can provide richer environments for implementing clouds on these HPC architectures. It was recently reported that a proof of concept cloud computing system was successfully embedded in a standard Blue Gene/P HPC supercomputer. This software defined system re-arranged user access to nodes and dynamically customized features of the BG/P architecture to map clouds systems and applications onto the BG/P. This paper reports on efforts to extend the results achieved on the BG/P to the newer BG/Q architecture. This work demonstrates a potential for a cloud to capitalize on the BG/Q infrastructure and provides a platform for developing better hybrid workflows and for experimentation with new schedulers and operating systems within a working HPC environment.
Article Preview

1. Introduction

Over the past several years cloud computing has made substantial technical and operational advances in reliability and availability. Today’s cloud technologies have now advanced to the point where they can provide end-users with considerable flexibility to self-provision resources, either explicitly or implicitly, and provide on-demand computational capabilities and services as defined and summarized in a recent NIST publication (Mell & Grance, 2011) describing what constitutes properties and characteristics of cloud computing. From the technical and operational perspective, users now have a spectrum of choices in the design and configuration of customized software stacks for various types of applications. From the business and economics perspective, a cloud option provides a mechanism to transfer the large capital expenditures for the purchase, operation and maintenance of a data center to a more” pay-as-you go” expense.

Cloud computing options have been applied to situation where users have constraints on facility access to computational resources, small prototype computations needing many short calculations with different parameters, and large computations with minimal communications requirements between processors. All of these types of computations have been successfully implemented in either private or commercial cloud computing systems. (UberCloud, 2014; Amazon Web Services, 2014). As a result, companies, academic institutions, organizations and individuals are seriously considering and experimenting with cloud computing as a platform for computation and data analysis.

Despite all of these advances, cloud computing has only had mixed success when attempting to implement supercomputing applications onto these types of platforms. Users explicitly requiring high performance computing favor systems that allow them to operate “close to the metal” with the ability to tune both the hardware and storage in order to optimize computational performance. Early efforts to re-create these HPC capabilities implemented the most straightforward option of deploying these supercomputing applications onto existing cloud platforms. Although this method did show some promise for codes with minimal inter-processor communications requirements, the more tightly coupled HPC applications suffered degraded performance. Alternative approaches involved constructing small groups of HPC clouds with more robust uniform hardware architecture and network connections. This design provided” spill-over provisioning” from the HPC supercomputer to a cloud system when the HPC system became saturated. Although these implementations did provide some overall acceleration, the underlying shortcomings of delivering HPC supercomputer level computational throughput with commodity cloud cluster hardware still remained problematic.

The basic difficulty is that general cloud computing systems lack the specialized HPC architectural infrastructure needed to deliver the required high throughput. Tightly coupled HPC codes need state-of-the-art network interconnects that can provide maximum computational throughput and minimum latency. Submitting such applications onto standard cloud computing systems generally result in degraded performance. Additional performance degradation was also attributed to a lack of uniformity in the computational hardware.

Complete Article List

Search this Journal:
Volume 15: 2 Issues (2023)
Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing