High-Performance Customizable Computing

High-Performance Customizable Computing

Domingo Benitez (University of Las Palmas de Gran Canaria, Spain)
DOI: 10.4018/978-1-61350-116-0.ch003
OnDemand PDF Download:


Many accelerator-based computers have demonstrated that they can be faster and more energy-efficient than traditional high-performance multi-core computers. Two types of programmable accelerators are available in high-performance computing: general-purpose accelerators such as GPUs, and customizable accelerators such as FPGAs, although general-purpose accelerators have received more attention. This chapter reviews the state-of-the-art and current trends of high-performance customizable computers (HPCC) and their use in Computational Science and Engineering (CSE). A top-down approach is used to be more accessible to the non-specialists. The “top view” is provided by a taxonomy of customizable computers. This abstract view is accompanied with a performance comparison of common CSE applications on HPCC systems and high-performance microprocessor-based computers. The “down view” examines software development, describing how CSE applications are programmed on HPCC computers. Additionally, a cost analysis and an example illustrate the origin of the benefits. Finally, the future of the high-performance customizable computing is analyzed.
Chapter Preview


Frequently, automated solutions to Computational Science and Engineering (CSE) problems require that billions to trillions of complex operations be applied to input data acquired from the real world. In many cases, these solutions must be reported without delay, they are time critical, and frequently, they must also be of the highest precision. Both, availability and precision of information are key elements in resolving CSE problems and so making living more comfortable and longer.

In order to reach this performance goal, high-performance computing is a research and development domain which aids the solution of CSE problems with a combination of high-performance computers and parallel programs. For many years, the fastest computers integrated central processing units (CPUs) or microprocessors that were specialized in performing the greatest number of operations per second. However, nowadays, the architectures of the fastest high-performance computers are dominated by a large population of multi-core programmable processors, many of which can be also integrated into desktop or server computers.

In this time of transition, new high-performance processors can provide higher levels of performance than their predecessors due mainly to an increase in the number of processing cores that are integrated on-chip. Increasing the numbers of processing cores on a single chip offers increased computer performance at somewhat lower power dissipation than a complex single-core microprocessor with an equivalent number of transistors on-chip.

Nevertheless, the multi-core approach does not address three basic problems. Firstly, the available computing power on-chip is not efficiently utilized by programs. Secondly, the connection from the processor to the external memory becomes more loaded as the number of cores increases. This and the difference in operating frequency between multi-core processor and external memory can become a bottleneck of parallel processing and stall some or all cores. The third problem is caused because effective programming of multi-core systems is difficult, and in many cases, software is ultimately responsible for the lack of performance scalability as the amount of cores increases (Mackin & Woods, 2006).

An alternative approach has arisen; High-Performance Customizable Computing (HPCC) is a different paradigm of high-performance computing. Instead of having only programmable processors, customizable computers also integrate hardware coprocessors with non-fixed architectures. These high-performance computing elements can be customized for a portion of a specific program and so accelerate the execution of key steps in the application software.

Customizable hardware devices offer the advantage of speeding up several software applications because its hardware flexibility allows the same chip to be specialized and reused. This is the main property that is applied to High-Performance Customizable Computing. This property is very useful in exploiting the inherent parallelism of many CSE problems. Customizable devices have shown a big potential for use in high-performance computing with much better power efficiency than programmable processors. New customizable devices are providing ever higher performance because their clock frequency and the number of transistors dedicated to specialized processing both are increasing. Additionally, customizable devices have other advantages that are exploited in embedded hardware engineering, such as reducing both the non-recurrent engineering costs (Dehon, 2008) and development time of a product (Guccione, 2008).

Two types of computing systems that integrate customizable devices are common nowadays: configurable and reconfigurable systems. Configurable Systems are built from baseline chip designs that are partially specialized during design-time and before fabrication (Leibson, 2006). After chip fabrication, these systems can be software-programmed but cannot be specialized anymore. On the other hand, Reconfigurable Systems are based on field-programmable devices that can be completely customized after fabrication (Chang, 2008).

The main goal of this chapter is to help readers understand how customizable hardware systems can be exploited to provide high performance, i.e., how to get 10X, 100X or 1000X the performance of the equivalent number of transistors in a microprocessor-based computer with much better power efficiency. The reader will gain insight into the design, management and use of high-performance infrastructures that integrate microprocessor-based and customizable computers.

Key Terms in this Chapter

Field Programmable Gate Array (FPGA): is a reconfigurable electronic device with fine-grain architecture that implements customized computational logic specific to the application being executed, and can be reconfigured for a wide range of tasks (Maxfield, 2004). Its reconfigurable architecture is composed of: processing elements called Look-Up-Tables (LUTs) that can implement any logic function with few inputs, an interconnection network that can connect any logic cell with the rest of the circuit, memory blocks that store data to be loaded by any other element of the architecture, and special modules that are integrated on chip to efficiently do a frequently used task such as multiplications, digital signal processing, and external input-output interfacing.

Reconfigurable Devices: are versatile configurable electronic components that are used to build distinct hardware implementations on the same set of reconfigurable resources after chip fabrication (Compton & Hauck, 2002). Reconfigurability is achieved by the use of an integrated configuration memory that stores the information about the state and functionally of each part of the device. The device is configured by loading in a configuration bitstream, consisting of a series of commands and frame data. At any time after a reconfigurable device has been powered up, it is possible to suspend its operations, load in a completely new hardware configuration, and restart its operation using the newly loaded configuration(Leibson, 2006).

Processor,: or Central Processing Unit (CPU): is the central circuit of a computer that processes a sequence of jobs that arrive over time and actually executes the application program. Since the beginning of CPUs, their performance has been driven by higher clock rates and improved internal organizations of the circuit. In the last years, new CPUs with higher clock rates are not commercially reliable and the technology trend has been to integrate more than one core on a chip. Additionally, low power CPUs are playing a central role in high-performance computers because building ever-larger clusters of commercial off-the-shelf hardware are being constrained by power and cooling (Donofrio et al, 2009; Henkel & Parameswaran, 2007).

High-Performance Computers (HPC): provide hardware and software infrastructures whose main goal is to accelerate the execution of customer applications or improve their fault-tolerance. Usually, these machines are composed of multiple processors, large memory capacity, large disk storage, and high-bandwidth communications among all their main components (Blake et al, 2009).

Coprocessors: are specialized circuits that can be integrated into a computer and connected to a CPU to provide added performance for applications, implementing specific computational tasks (Gulati & Khatri, 2010).

Customizable Electronic Devices: can be customized to efficiently execute a task and frequently are used as CPU and/or coprocessor. They can achieve typically 10-1000 times faster execution compared to today’s fastest CPU and a reduction of about 95% in power consumption. Three types of customizable devices can be distinguished: ASICs, Configurable Processors, and Reconfigurable Devices.

Application Specific Integrated Circuit (ASIC): is an integrated electronic circuit that is customizable during the design phase to efficiently execute a specific task. After fabrication, it cannot be modified to execute other tasks. This customizable device can achieve the best performance and the lowest energy consumption. However, its design and fabrication costs are very high and can only be justified if the number of chips sold is very large (Rigo et al, 2010).

Configurable Processors: are special ASICs that are based on a conventional CPU and tailored during chip design time for a specific software application. This type of processor produces much better computing efficiency and much lower power consumption than the original CPU. After fabrication, they cannot be configured again.

Instruction Set Architecture (ISA): is the set of hardware elements of the processor that can be managed by the software program. Hardware control from the program is performed via normalized machine instructions. A program is a composition of machine instructions that are loaded sequentially by the CPU over time (Patterson & Hennessy, 2009).

Parallelization: is the software technique that allows an application program to be partitioned and then, the independent resources in a computer can be efficiently activated with the independent program parts. This software partitioning can be done at instruction-level, data-level, thread-level, procedure-level or program-level. Many CSE applications can be parallelized. If the respective parallel programs are executed on HPC platforms, costs and manpower are improved (Akhter & Roberts, 2006).

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Joanna Leng, Wes Sharrock
Chapter 1
Gabriele Jost, Alice E. Koniges
The upcoming years bring new challenges in high-performance computing (HPC) technology. Fundamental changes in the building blocks of HPC hardware... Sample PDF
Hardware Trends and Implications for Programming Models
Chapter 2
Ivan Girotto, Robert M. Farber
This chapter focuses on the technical/commercial dynamics of multi-threaded hardware architecture development, including a cost/benefit account of... Sample PDF
Multi-Threaded Architectures: Evolution, Costs, Opportunities
Chapter 3
Domingo Benitez
Many accelerator-based computers have demonstrated that they can be faster and more energy-efficient than traditional high-performance multi-core... Sample PDF
High-Performance Customizable Computing
Chapter 4
Rasit O. Topaloglu, Swati R. Manjari, Saroj K. Nayak
Interconnects in semiconductor integrated circuits have shrunk to nanoscale sizes. This size reduction requires accurate analysis of the quantum... Sample PDF
High-Performance Computing for Theoretical Study of Nanoscale and Molecular Interconnects
Chapter 5
Prashobh Balasundaram
This chapter presents a study of leading open source performance analysis tools for high performance computing (HPC). The first section motivates... Sample PDF
Effective Open-Source Performance Analysis Tools
Chapter 6
David Worth, Chris Greenough, Shawn Chin
The purpose of this chapter is to introduce scientific software developers to software engineering tools and techniques that will save them much... Sample PDF
Pragmatic Software Engineering for Computational Science
Chapter 7
Diane Kelly, Daniel Hook, Rebecca Sanders
The aim of this chapter is to provide guidance on the challenges and approaches to testing computational applications. Testing in our case is... Sample PDF
A Framework for Testing Code in Computational Applications
Chapter 8
Judith Segal, Chris Morris
There are significant challenges in developing scientific software for a broad community. In this chapter, we discuss how these challenges are... Sample PDF
Developing Software for a Scientific Community: Some Challenges and Solutions
Chapter 9
Fumie Costen, Akos Balasko
The computational architecture of Enabling Grids for E-sciencE is introduced as it made our code porting very challenging, and the discussion... Sample PDF
Opportunities and Challenges in Porting a Parallel Code from a Tightly-Coupled System to the Distributed EU Grid, Enabling Grids for E-sciencE
Chapter 10
Abid Yahya, Farid Ghani, R. Badlishah Ahmad, Mostafijur Rahman, Aini Syuhada, Othman Sidek, M. F. M. Salleh
This chapter presents performance of a new technique for constructing Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) encrypted codes based on a row... Sample PDF
Development of an Efficient and Secure Mobile Communication System with New Future Directions
Chapter 11
Hubertus J. J. van Dam
Quantum chemistry was a compute intensive field from the beginning. It was also an early adopter of parallel computing, and hence, has more than... Sample PDF
Parallel Quantum Chemistry at the Crossroads
Chapter 12
Marc Hafner, Heinz Koeppl
With the advances in measurement technology for molecular biology, predictive mathematical models of cellular processes come in reach. A large... Sample PDF
Stochastic Simulations in Systems Biology
Chapter 13
C. T. J. Dodson
Many real processes have stochastic features which seem to be representable in some intuitive sense as `close to Poisson’, `nearly random’, `nearly... Sample PDF
Some Illustrations of Information Geometry in Biology and Physics
Chapter 14
Stefania Tomasiello
Though relatively unknown, the Differential Quadrature Method (DQM) is a promising numerical technique that produces accurate solutions with less... Sample PDF
DQ Based Methods: Theory and Application to Engineering and Physical Sciences
Chapter 15
Marco Evangelos Biancolini
Radial Basis Functions (RBF) mesh morphing, its theoretical basis, its numerical implementation, and its use for the solution of industrial... Sample PDF
Mesh Morphing and Smoothing by Means of Radial Basis Functions (RBF): A Practical Example Using Fluent and RBF Morph
Chapter 16
Joanna Leng, Theresa-Marie Rhyne, Wes Sharrock
This chapter focuses on state of the art at the intersection of visualization and CSE. From understanding current trends it looks to future... Sample PDF
Visualization: Future Technology and Practices for Computational Science and Engineering
Chapter 17
Peter Sarlin
Since the 1980s, two severe global waves of sovereign defaults have occurred in less developed countries (LDCs): the LDC defaults in the 1980s and... Sample PDF
Visualizing Indicators of Debt Crises in a Lower Dimension: A Self-Organizing Maps Approach
Chapter 18
Iain Barrass, Joanna Leng
Since infectious diseases pose a significant risk to human health many countries aim to control their spread. Public health bodies faced with a... Sample PDF
Improving Computational Models and Practices: Scenario Testing and Forecasting the Spread of Infectious Disease
Chapter 19
Eldon R. Rene, Sung Joo Kim, Dae Hee Lee, Woo Bong Je, Mirian Estefanía López, Hung Suck Park
Sequencing batch reactor (SBR) is a versatile, eco-friendly, and cost-saving process for the biological treatment of nutrient-rich wastewater, at... Sample PDF
Artificial Neural Network Modelling of Sequencing Batch Reactor Performance
Chapter 20
Joanna Leng, Wes Sharrock
Computational Science and Engineering (CSE) is an emerging, rapidly developing, and potentially very significant force in changing scientific... Sample PDF
The State of Development of CSE
Chapter 21
Kerstin Kleese van Dam, Mark James, Andrew M. Walker
This chapter describes the key principles and components of a good data management system, provides real world examples of how these can be... Sample PDF
Integrating Data Management and Collaborative Sharing with Computational Science Research Processes
Chapter 22
Jens Jensen, David L. Groep
Modern science increasingly depends on international collaborations. Large instruments are expensive and have to be funded by several countries, and... Sample PDF
Security and Trust in a Global Research Infrastructure
Chapter 23
Matt Ratto
Computational science and engineering (CSE) technologies and methods are increasingly considered important tools for the humanities and are being... Sample PDF
CSE as Epistemic Technologies: Computer Modeling and Disciplinary Difference in the Humanities
Chapter 24
Phillip L. Manning, Peter L. Falkingham
Dinosaurs successfully conjure images of lost worlds and forgotten lives. Our understanding of these iconic, extinct animals now comes from many... Sample PDF
Science Communication with Dinosaurs
About the Contributors