Multi-Threaded Architectures: Evolution, Costs, Opportunities

Multi-Threaded Architectures: Evolution, Costs, Opportunities

Ivan Girotto (National University of Ireland Galway, Republic of Ireland) and Robert M. Farber (Pacific Northwest National Laboratory, USA)
DOI: 10.4018/978-1-61350-116-0.ch002


This chapter focuses on the technical/commercial dynamics of multi-threaded hardware architecture development, including a cost/benefit account of current and future developments, and the implications for scientific practice.
Chapter Preview


Recently, the computer industry has moved en masse to parallel architectures. Computing technologies are commonly based on multi- and many-core systems with tens to hundreds of concurrent hardware processing elements on workstations up to many thousands per data servers or supercomputer node.

This powerful computing capacity represents an extraordinary opportunity to speed up both current and further software application. Current parallel hardware from commodity to the newest leadership class supercomputers can provide from one-order to seven-orders of magnitude increases in performance over single-core processors. Affordable general-purpose graphic processors technology, sold at price points ranging from a hundred to a couple thousand dollars per board, demonstrates performance improvements ranging from one-order to three-orders of magnitude on a wide-range of applications in the scientific literature.

This mass adoption profoundly affects every aspect of computation-based projects (be they new or based on legacy software) including investment, planning, development, procurement, and deployment.

Current software development tools demonstrate that it is possible to program these massively parallel systems on a wide-spectrum of problems in high-level languages to gain outstanding performance. However, the gap between hardware and software trajectories tends to grow. Good times were when software application performance was directly related to new hardware generation with higher clock frequency. For assessing how well utilized the processors are as more parallel hardware becomes available efficiency, is a key metric that defines the performance of both the algorithms and software implementation.

The trend to massive parallelism appears inescapable and massive threaded software is a clear requirement to achieve high performance. Consumers, developers, scientists, managers and organizations need to understand that single-threaded (serial) or poorly scaling multi-threaded software will cause application performance to plateau at or near current levels on both current and future hardware. This has important implications for computation-dependent projects because it defines the limits of the computing capacity and impacts on both product and project competitiveness relative to other computational-based approaches. However, the cost associated with re-engineering software (and potentially re-design of algorithms) to capitalize on new parallel architectures must be considered along with applications scalability and lifespan. In general, owners of legacy software are more likely to require new software and/or software development because a significant amount of existing commercial and scientific software was developed for single-threaded processors – much of it prior to the general availability of multi-core hardware.

HPC can act as a conduit for disruptive new technologies by acting as an early adopter and proving ground for hardware and software models that radically transform both consumer and business computing market spaces. The innovations that improve performance are not always expected and while they reduce cost, improve performance and create new opportunities, they can also damage existing markets and deprecate current applications and software. This is not a random process, but rather is part of an evolutionary process driven by competition; limited by the inefficiencies of electrical components and manufacturing processes; and advanced through scientific and design ingenuity.

This chapter will explore the disruptive nature of current innovations in massively threaded architectures, beginning with the demise of faster clock speeds and how that caused a renaissance in massive parallelism. It considers important architectures including GPGPU (general-purpose graphic processors units); specialized architectures such as the Cray XMT; extreme scale computing; and hybrid systems, examining these in terms of cost impacts on budgets and also on power and performance. Finally, future opportunities will be discussed.

Key Terms in this Chapter

Speedup: In parallel computing, speedup refers to how much a parallel algorithm is faster than a corresponding sequential algorithm.

Multi-Threading / Multi-Threaded Hardware: Multi-threading computers have hardware support to efficiently execute multiple threads. The goal of multi-threading hardware support is to allow quick switching between a blocked thread and another thread ready to run.

Peak Performance: the theoretical peak performance represent the maximum number of floating point operation possible on a given hardware platform. On Jack Dongarra mentions it can be referred to as: i) speed that vendor is guaranteed never to exceed, ii) the computational speed of light of the system, iii) the fastest a machine can run without software or data.

Supercomputer: A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers are used for highly calculation-intensive tasks such as problems involving quantum physics, weather forecasting, climate research, molecular modeling and physical simulation.

High-Level Programming Languages: A high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or be more portable across platforms.

Many-Core: A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient — largely due to issues with congestion in supplying instructions and data to the many processors. The many-core threshold is roughly in the range from several tens to hundreds of cores.

Multi-Threaded / Multi-Threading Software: In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. A multithreaded program allows multi-threading system to operate faster because the threads of the program naturally lend themselves to truly concurrent execution.

Multi-Core: A multi-core processor is a processing system composed of two or more independent cores. It can be described as an integrated circuit to which two or more individual processors (called cores in this sense) have been attached.

Memory Bound: refers to a situation in which the time to complete a given computational problem is decided primarily by the amount of available memory to hold data. In other words, the limiting factor of solving a given problem is the memory access speed.

Scalability: is the ability of a system, network, or process, to handle growing amounts of work in a graceful manner or its ability to be enlarged to accommodate that growth. For example, it can refer to the capability of a system to increase total throughput under an increased load when resources (typically hardware) are added.

Complete Chapter List

Search this Book: