Porting HPC Applications to Grids and Clouds

Porting HPC Applications to Grids and Clouds

Wolfgang Gentzsch
Copyright: © 2011 |Pages: 29
DOI: 10.4018/978-1-60960-603-9.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A Grid enables remote, secure access to a set of distributed, networked computing and data resources. Clouds are a natural complement to Grids towards the provisioning of IT as a service. To “Grid-enable” applications, users have to cope with: complexity of Grid infrastructure; heterogeneous compute and data nodes; wide spectrum of Grid middleware tools and services; the e-science application architectures, algorithms and programs. For clouds, on the other hand, users don’t have many possibilities to adjust their application to an underlying cloud architecture, because of its transparency to the user. Therefore, the aim of this chapter is to guide users through the important stages of implementing HPC applications on Grid and cloud infrastructures, together with a discussion of important challenges and their potential solutions. As a case study for Grids, we present the Distributed European Infrastructure for Supercomputing Applications (DEISA) and describe the DEISA Extreme Computing Initiative (DECI) for porting and running scientific grand challenge applications on the DEISA Grid. For clouds, we present several case studies of HPC applications running on Amazon’s Elastic Compute Cloud EC2 and its recent Cluster Compute Instances for HPC. This chapter concludes with the author’s top ten rules of building sustainable Grid and cloud e-infrastructures.
Chapter Preview
Top

Introduction

Over the last 40 years, the history of computing is deeply marked of the affliction of the application developers who continuously are porting and optimizing their applications codes to the latest and greatest computing architectures and environments. After the von-Neumann mainframe came the vector computer, then the shared-memory parallel computer, the distributed-memory parallel computer, the very-long-instruction word computer, the workstation cluster, the meta-computer, and the Grid (never fear, it continues, with SOA, Cloud, Virtualization, Many-core, and so on). There is no easy solution to this, and the real solution would be a separation of concerns between discipline-specific content and domain-independent software and hardware infrastructure. However, this often comes along with a loss of performance stemming from the overhead of the infrastructure layers. Recently, users and developers face another wave of complex computing infrastructures: the Grid.

Let’s start with answering the question: What is a Grid? Back in 1998, Ian Foster and Carl Kesselman (1998) attempted the following definition: “A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.” In a subsequent article (Foster, 2002), “The Anatomy of the Grid,” Ian Foster, Carl Kesselman, and Steve Tuecke changed this definition to include social and policy issues, stating that Grid computing is concerned with “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.” The key concept is the ability to negotiate resource-sharing arrangements among a set of participating parties (providers and consumers) and then to use the resulting resource pool for some purpose. This definition seemed very ambitious, and as history has proven, many of the Grid projects with a focus on these ambitious objectives did not lead to a sustainable Grid production environment. The simpler the Grid infrastructure, and the easier to use, and the sharper its focus, the bigger is its chance for success. And it is for a good reason (which we will explain in the following) that currently Clouds are becoming more and more popular (Amazon, 2007 and 2010).

Over the last ten years, hundreds of applications in science, industry and enterprises have been ported to Grid infrastructures, mostly prototypes in the early definition of Foster & Kesselman (1998). Each application is unique in that it solves a specific problem, based on modeling, for example, a specific phenomenon in nature (physics, chemistry, biology, etc.), presented as a mathematical formula together with appropriate initial and boundary conditions, represented by its discrete analogue using sophisticated numerical methods, translated into a programming language computers can understand, adjusted to the underlying computer architecture, embedded in a workflow, and accessible remotely by the user through a secure, transparent and application-specific portal. In just these very few words, this summarizes the wide spectrum and complexity we face in problem solving on Grid infrastructures.

The user (and especially the developer) faces several layers of complexity when porting applications to a computing environment, especially to a compute or data Grid of distributed networked nodes ranging from desktops to supercomputers. These nodes, usually, consist of several to many loosely or tightly coupled processors and, more and more, these processors contain few to many cores. To run efficiently on such systems, applications have to be adjusted to the different layers, taking into account different levels of granularity, from fine-grain structures deploying multi-core architectures at processor level to the coarse granularity found in application workflows representing for example multi-physics applications. Not enough, the user has to take into account the specific requirements of the grid, coming from the different components of the Grid services architecture, such as security, resource management, information services, and data management.

Complete Chapter List

Search this Book:
Reset