Real-Time Image and Video Processing Using High-Level Synthesis (HLS)

Real-Time Image and Video Processing Using High-Level Synthesis (HLS)

Murad Qasaimeh (Iowa State University, USA) and Ehab Najeh Salahat (Australian National University, Australia)
DOI: 10.4018/978-1-5225-2848-7.ch015
OnDemand PDF Download:
List Price: $37.50


Implementing high-performance, low-cost hardware accelerators for the computationally intensive image and video processing algorithms has attracted a lot of attention in the last 20 years. Most of the recent research efforts were trying to figure out new design automation methods to fill the gap between the ability of realizing efficient accelerators in hardware and the tight performance requirements of the complex image processing algorithms. High-Level synthesis (HLS) is a new method to automate the design process by transforming high-level algorithmic description into digital hardware while satisfying the design constraints. This chapter focuses on evaluating the suitability of using HLS as a new tool to accelerate the most demanding image and video processing algorithms in hardware. It discusses the gained benefits and current limitations, the recent academic and commercial tools, the compiler's optimization techniques and four case studies.
Chapter Preview

1. Introduction

In the last two decades, the complexity of image and video processing algorithms has been continuously increasing to meet the demands of complex applications. Pure software implementations for most of these algorithms on embedded systems are still far from reaching real-time performance. This forced embedded designers to accelerate them in hardware to meet the required performance. Some used general purpose processing units (GPUs), others used multicore CPUs, but FPGAs showed, since it is introduced, that it is the most suitable platform to implement image processing algorithms in hardware. FPGAs can be configured to exploit spatial and temporal parallelism in image processing algorithm by realizing multiple processing pipelines that could process data concurrently.

Increasing the algorithms complexity went along with a growth in FPGA chip's silicon capacity (Trimberger, 2015). Every couple of years, the silicon size increased due to the improvement in transistor length. With this improvement, porting these algorithms manually into embedded systems still hard and time consuming task to do. It required implementing the whole pipeline in Register Transfer Language (RTL) using one of the hardware description languages such as VHDL or Verilog. Several solutions have been proposed to speed-up the design process such as: simplifying the algorithm itself, using softcore CPUs or using High-level synthesis (HLS). Simplifying the algorithm usually leads to scarify in its accuracy and using softcore CPUs is not always an efficient way to reach the applications target performance and power budget.

High-Level synthesis (HLS) offers a way to generate RTL designs from high-level abstraction in an automatic manner while satisfying the design constraints and optimizing the given cost function. The main goal is to efficiently build and verify hardware, by giving embedded systems designers better control over their designs optimization processes. It allows designers to describe the design at high level of abstraction using one of the high-level languages such as C, C++ or even Matlab, while the tool generates the RTL implementation from these system level abstractions. Verifying the generated RTL, to check that the generated hardware architectures functionality matches the given high-level abstraction, is an important part of the process and can be done at high-level also well using RTL/C co-simulation.

Several advantages can be gained by using HLS technique in image processing hardware accelerators (Bailey, 2015). First, the amount of code to be written by designers using HLS is much lower compared to the manual RTL method. This will save designers a lot of time and reduces the risk of making mistakes. It also gives them the flexibility to optimize their designs by tweaking the source code and tool options to explore large design space alternatives. HLS also reduces the verification time, as the tools can be used to generate high level test benches as well. It also makes it possible to handle more complex designs by removing the need for manual coding. This also allows non-experts in hardware to generate hardware accelerators for their algorithms with minimal effort. There have been many a lot of success stories of using HLS tools in accelerating image processing algorithms, but there are still little systematic studies that focused on evaluating these designs.

The objective of this chapter is to evaluate the suitability of using HLS in implementing real-time image and video processing in embedded systems. It also gives detailed explanations of the design process in HLS and the optimization methods necessary to generate efficient circuits. It presents four image processing algorithms and evaluates the performance of its HLS implementations. The layout of this chapter is as follows. Section 2 presents brief background information about FPGA, HLS and other necessary concepts. Section 3 discusses the benefits and limitations of HLS over other design methods. In Section 4, number of HLS optimization techniques has been presented. An overview of the current academic and commercial tools is presented in Section 5. Section 6 shows an example of HLS video C/C++ library. In Section 7, four case studies have been evaluated. Finally, Section 8 briefly introduces the future work and the current open research problems. Section 9 concludes the chapter.

Key Terms in this Chapter

Stereo Vision: Is the process of extracting 3D information from digital images taken by two cameras displaced horizontally from one another to obtain two different views of the same scene.

VHDL: (VHSIC Hardware Description Language) is a hardware description language used to design hardware circuits.

HLS: (High Level Synthesis) is a new design automation method that generates RTL designs from a description written in one of the high-level programming language like C, C++ or MATLAB.

Verilog: Is a hardware description language (HDL) used to design and model electronic circuits. It is the most commonly used language for designing digital circuits at the RTL level.

FPGA: (Fields programmable gate array) is reconfigurable integrated circuits (IC) that can be reprogrammed after manufacturing. It consists of large matrix of configurable logics connected using switching blocks.

OpenCV: (Open Source Computer Vision) is a cross-platform image processing library created by intel. It covers wide range of application areas in image and video processing.

RTL: (Register Transfer Level) is a design method that models digital circuits in terms of the flow of data between hardware registers.

GPU: (Graphics Processing Unit) is a specialized integrated circuit designed thousands of small and efficient cores to handle multiple tasks simultaneously.

DSPs: (Digital signal processor) is a specialized microprocessor architecture optimized for the operational needs of digital signal processing like voice and image processing algorithms.

Optical flow: Is an algorithm that estimates the displacement and speed of the features pixels between successive frames in videos to create flow field.

Complete Chapter List

Search this Book: