Effective Open-Source Performance Analysis Tools

Effective Open-Source Performance Analysis Tools

Prashobh Balasundaram (IBM Dublin Software Laboratories, Republic of Ireland)
DOI: 10.4018/978-1-61350-116-0.ch005


This chapter presents a study of leading open source performance analysis tools for high performance computing (HPC). The first section motivates the necessity of open source tools for performance analysis. Background information on performance analysis of computational software is presented discussing the various performance critical components of computers. Metrics useful for performance analysis of common performance bottleneck patterns observed in computational codes are enumerated and followed by an evaluation of open source tools useful for extracting these metrics. The tool’s features are analyzed from the perspective of an end user. Important factors are discussed, such as the portability of tuning applied after identification of performance bottlenecks, the hardware/software requirements of the tools, the need for additional metrics for novel hardware features, and identification of these new metrics and techniques for measuring them. This chapter focuses on open source tools since they are freely available to anyone at no cost.
Chapter Preview


Performance optimization of computational software is often an iterative and tedious process. The algorithms developed by the scientist/engineer may work well on one computer architecture and may need further performance tuning on another architecture. The need to compare performance characteristics across many hardware architectures is a routine task performed when benchmarking high performance computing systems for procurement. Open-source tools for performance analysis are suited to compare application performance characteristics across a wide range of computing architectures. The main goal of this chapter is to outline a few open-source development and performance analysis tools that the author found to be effective in a wide range of computing hardware.

Today, CPUs using the x86 or x86_64, power instruction set architectures are widely used for scientific high performance computing. Single core processors gave way to dual and quad core processors mainly due to power and thermal constraints. By 2011, six core to twelve core processors from Intel and AMD are expected to be widely used in HPC systems. Another clear trend in scientific computing hardware is the use of general purpose graphics processing units (GPGPUs) as accelerators for HPC applications. Programmed using high level programming languages like CUDA (Nvidia Corporation) and OpenCL implementations (Khronos group), these commodity products are redefining the architecture of HPC systems especially at the low and medium scale deployments. The capability supercomputing space is dominated by massively parallel distributed memory supercomputers. These machines often use multi-core processors running at relatively lower frequencies enabling higher packaging densities.

The most widely used programming model for distributed memory parallel computers is based on the Message Passing Interface (MPI). Most recent distributed memory machines use multi-core chips and therefore shared memory programming models like OpenMP (OpenMP ARB) is used to exploit parallelism at the node level. This helps applications scale better by reducing the inter-processor communication through the interconnect (Smith L & Bull M, 2000). As the number of processor cores in a single chip increases, contention for shared resources within a single multi-core processor becomes a major issue preventing linear scaling. Recent processor designs focus on techniques to reduce contention for shared processor resources like memory bandwidth. The use of accelerators in HPC systems introduces an additional layer of software and hardware components. Effective analysis of the performance characteristics of software on these hybrid systems is an active area of research.

Analysis and understanding of performance bottlenecks on scientific computing applications across systems of varying hardware architectures is an important task for HPC practitioners. Standards based open source performance analysis applications facilitate the comparison of application characteristics across many hardware architectures. They are available on a wide range of systems and often offer an independent tool-chain in addition to those offered by the vendor of the HPC system.

Key Terms in this Chapter

Message Passing Interface: A standard application programming interface used to implement communication between compute nodes or compute cores of a massively parallel supercomputer.

Open Source Software: Software made available as source code under a license which permits users to study source code, modify, and possibly redistribute binaries and source code

Hardware Performance Counters: A hardware feature available in modern processors allowing detailed analysis of performance characteristics of applications

Performance Analysis Tools: Tools for analysing the performance characteristics of software deployed on HPC systems

Profiling: Extracting the critical performance characteristics of an application when it is executed on a computer system

OpenMP: A standard programming model for shared memory parallel computers

High Performance Computing: Usage of massively parallel computers and Linux clusters to accelerate scientific computing.

Complete Chapter List

Search this Book: