Practical Examples of Automated Development of Efficient Parallel Programs

Practical Examples of Automated Development of Efficient Parallel Programs

DOI: 10.4018/978-1-5225-9384-3.ch006

Abstract

In this chapter, some examples of application of the developed software tools for design, generation, transformation, and optimization of programs for multicore processors and graphics processing units are considered. In particular, the algebra-algorithmic-integrated toolkit for design and synthesis of programs (IDS) and the rewriting rules system TermWare.NET are applied for design and parallelization of programs for multicore central processing units. The developed algebra-dynamic models and the rewriting rules toolkit are used for parallelization and optimization of programs for NVIDIA GPUs supporting the CUDA technology. The TuningGenie framework is applied for parallel program auto-tuning: optimization of sorting, Brownian motion simulation, and meteorological forecasting programs to a target platform. The parallelization of Fortran programs using the rewriting rules technique on sample problems in the field of quantum chemistry is examined.
Chapter Preview
Top

Introduction

In this chapter, the developed software tools described in Chapter 5 (the algebra-algorithmic integrated toolkit for design and synthesis of programs (IDS), the rewriting rules system TermWare and the auto-tuning framework TuningGenie) are applied for design, generation and transformation of sample programs for multicore processors and graphics processing units. IDS toolkit (Andon, Doroshenko, Tseytlin, & Yatsenko, 2007; Doroshenko, Ivanenko, Ovdii, & Yatsenko, 2016; Doroshenko, Zhereb, & Yatsenko, 2013) uses algebraic specifications based on Glushkov’s system of algorithmic algebra (SAA) (see Chapter 1), which are represented in a natural linguistic form, namely, the algorithmic language SAA/1 considered in Chapter 2. IDS is based on the method of dialogue design of syntactically correct algorithm schemes, which eliminates syntax errors during construction of algorithm specifications . To automate parallelizing and optimizing transformations of programs being designed, the rewriting rules system TermWare (Doroshenko & Shevchenko, 2006) is used. TuningGenie framework (Ivanenko, Doroshenko, Zhereb, 2014) is applied to automate the adjustment of programs to a target computing environment. The application of the mentioned software tools is illustrated by the development of programs in various subject domains (sorting, meteorological forecasting, quantum chemistry and other) written in C#, Java and Fortran languages.

It should be noted that despite being one of the first programming languages, Fortran is still widely used, in particular, for solving scientific and engineering computation-intensive problems. Its popularity is due to its relative simplicity and lack of complex facilities (e.g., pointers), closeness to mathematical description of a problem and efficiency of generated binary code. Another reason for continued use of Fortran is that in more than 50 years of its existence, a vast repository of programs, libraries and routines for solving different scientific problems has been developed. Algorithms implemented in such programs are still valuable, however, there is a need to adapt this legacy code to new parallel computing platforms. Furthermore, due to a size and a complexity of existing code, manual adaptation is not a practical option: there is a need for automated tools to facilitate conversion of legacy code to modern parallel platforms (Buttari et al., 2007).

There has been an extensive research in the area of parallelizing existing sequential code, in particular, for multicore architectures. Some approaches require manual code modification and provide facilities that help a developer to express the parallelism. Such approaches include parallel libraries (Leijen, Schulte, & Burckhardt, 2009), parallel extensions to existing languages (“OpenMP Application Programming Interface”, 2015) and new parallel languages (Saraswat, Sarkar, & von Praun, 2007). Another research direction is interactive parallelization (Ishihara, Honda, & Sato, 2006), when a developer manually selects the loops to be parallelized, and the tool applies transformation automatically (the approach presented in this chapter also belongs to this category). Finally, there are numerous approaches to automated parallelization, mostly implemented as parallelizing compilers (Allen & Kennedy, 2001). Such systems use static analysis of a source code to detect possible areas of parallelism and generate parallel binary code. Some papers also use dynamic analysis to detect parallelism based on specific input data (Rus, Pennings, & Rauchwerger, 2007), or machine learning approaches to select most appropriate transformations (Tournavitis, Wang, Franke, & O’Boyle, 2009), or auto-tuning to discover optimal parameters of transformations (Datta et al., 2008).

Key Terms in this Chapter

SAA/1: The algorithmic language based on Glushkov’s system of algorithmic algebras and focused on natural linguistic representation of schemes. It is applied for multilevel structured designing and documenting of sequential and parallel algorithms and programs.

Graphics Processing Unit (GPU): A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Modern GPUs are very efficient at manipulating computer graphics and image processing, and their highly parallel structure makes them more efficient than general-purpose CPUs for algorithms where the processing of large blocks of data is done in parallel.

CUDA: A parallel computing platform and application programming interface (API) model created by NVIDIA. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU).

Open Multi-Processing (OpenMP): An application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

TuningGenie: The framework automating the adjustment of programs to a target computing environment.

Glushkov’s System of Algorithmic Algebras (SAA, Glushkov’s Algebra): The two-sorted algebra focused on analytical form of representation of algorithms and formalized transformation of these representations.

SAA Scheme: A representation of an algorithm in algorithmic language SAA/1.

TermWare: An open-source implementation of rewriting rules engine written in Java. It provides a language for describing rewriting rules that operate on data structures called terms and also a rule engine that interprets rules to transform terms. Termware.NET toolkit is based on TermWare and intended for transformation of programs on Microsoft.NET platform.

Integrated Toolkit for Design and Synthesis of Programs (IDS): The software system intended for automated constructing of sequential and parallel algorithm schemes represented in the modified system of algorithmic algebras (SAA-M) and synthesis of programs in C, C++, Java languages.

Complete Chapter List

Search this Book:
Reset