Opportunities and Challenges in Porting a Parallel Code from a Tightly-Coupled System to the Distributed EU Grid, Enabling Grids for E-sciencE

Opportunities and Challenges in Porting a Parallel Code from a Tightly-Coupled System to the Distributed EU Grid, Enabling Grids for E-sciencE

Fumie Costen (University of Manchester, UK) and Akos Balasko (Hungarian Academy of Sciences, Hungary)
DOI: 10.4018/978-1-61350-116-0.ch009

Abstract

The computational architecture of Enabling Grids for E-sciencE is introduced as it made our code porting very challenging, and the discussion presented is directly applicable to EGEE users. The final solution to the code poring problem is proposed, and its performance is examined. The solution to this problem be generally faced in the other large scale computation and so is applicable to users of other HPC facilities. This chapter gives a hint to those who have difficulties in applications with heavy data Input/Output (I/O) under the computational environment whose weak point is the data I/O.
Chapter Preview
Top

Introduction

Research on distributed and parallel systems is one of the most important areas in computer science. This area is based on the exploitation of large computational and data storing capabilities. While the main components i.e., processors and hard drives in a single computer are becoming smaller but with larger storage capacity and higher processing performance, distributed systems can integrate these individual resources into one large, heterogeneous, dynamic system that allows users to benefit from the possible improved performance. These systems are called grid.

The main goal of a well-maintained grid is to provide large scale resources connected via the Internet to researchers in the natural sciences and engineering who have applications with high demands for compute resources, or storing more data than a single machine can accommodate. Certainly, these applications must be parallelised to fully exploit the resource capabilities, and make them run faster in grid systems.

Researchers are nowadays surrounded with a variety of grid computing facilities. Some are more suitable to one’s application than others but the cost and the performance of each HPC facility is also different. Furthermore, the cost, the performance and the suitability are always changing over time. Therefore researchers have to be prepared for the change in the computational facility and have to be able to adjust to the new computational environment.

This chapter shares the authors’ experience of a significant change to the computational environment used in the daily research activities and provides some hints to those who may face the similar situation.

The authors’ experience is based on the Enabling Grids for E-sciencE (EGEE) project.

The EGEE project-family, founded by the European Commission, started on April 2004. It has provided academic and industrial researchers the means to have access to large computing resources. It is focused on developing and maintaining a robust and powerful grid network and components, and to attract new users from industry by standardized training and dissemination events.

A new grid-middleware, called gLite was developed during this project. Its aim was to organize and connect the components of the large and international grid system. The last project of this family (EGEE-III) was ended on April 30th 2010. The new project was created to continue the development of distributed systems internationally in Europe and is called European Grid Initiative(EGI). In this project all of the old organizational-ideas have been reformed. EGI manages the collaborative work of NGIs (National Grid Initiatives) that are created to support the national grid-community and maintain the related grid-services.

Another but no less important project, founded by the European Commission is EMI (European Middleware Initiative). This project aims at integrating the three major European grid middleware systems (ARC, gLite, Unicore) into a unified middleware distribution (UMD) in order to support the co-operation of researchers in the same research field but with different grid-middlewares.

The section on Computation in Electromagnetics discusses the motivation of our research and introduces the core part of the equations necessary to understand the nature of our computation. Furthermore the section talks about the computational environment we used before we faced a significant change. The section on EGEE computational facility introduces the computational architecture of Enabling Grids for E-sciencE(EGEE), which is significantly different from our initial architectures. The section on the Adaptation of our code to EGEE describes the problems which we faced and presents the solutions. The section on Future research direction gives some insight and suggestion for the improvement of the computational algorithms as well as the algorithms which could be applied to the data I/O problems.

Key Terms in this Chapter

Grid Computing: Computations using a computational grid facility that consists of many computational or data-store resources. The software program that contains the computation has to take into account the fact that there are many computational cores in many nodes.

Code Porting: Usually a software/code is developed with a specific computational environment in mind. Therefore when a code written in one computational environment is going to be run in another computational environment, some part of the code has to be modified. The activity to make a code that runs on one machine usable in the other machine is called code porting.

Security: Some information on the users and/or the administrators in the computational facility has to be kept secret. Security addresses the methods available that keep the computational facility, including the private data within the facility safe.

Enabling Grids for E-sciencE(EGEE): A grid-project that provides large computational resources connected via the Internet and founded by European Union. Applications must be parallelized in order to benefit from EGEE.

Message Passing Interface: A programming language-independent specification that provides a multimode communication protocol. Data Input/Output: activity to read/load data from data files and to produce data files of the computation result.

Numerical Methods: The study on how to solve equations accurately and efficiently from the viewpoint of the computation.

Computational Electromagnetics: Research on the electromagnetic wave propagation using the computational facility

Job Management: Scheduling and managing computational requests to a particular computing element. In cases where there are more than one user in a grid computing facility, their computational requests (jobs) are submitted to one location. These jobs are ranked in priority and placed at the individual computational nodes.

Finite Difference Time Domain (FDTD) Method: One of the most simple and powerful method to solve Maxwell equations for the numerical simulation of the ElectroMagnetic(EM) wave propagation. Maxwell equations are temporally and spatially discretised. The basic equations are repeatedly executed at each FDTD iteration. The outcome of the FDTD simulation is the signal signature(waveform) in time domain.

Parallel Code: When there are many cores in many nodes in a computational environment, a code which uses a single core for computation can be modified so that a big task in the code can be divided into many little tasks and each little task is handled by one node and many nodes work for this single and big task at the same time. The code modified in this way is called parallel code which can make use of more than one machines in a single run and run on more than one machine simultaneously.

Complete Chapter List

Search this Book:
Reset