Accelerating Sobel Edge Detection Using Compressor Cells Over FPGAs

Accelerating Sobel Edge Detection Using Compressor Cells Over FPGAs

Ahmed Abouelfarag, Marwa Ali Elshenawy, Esraa Alaaeldin Khattab
Copyright: © 2018 |Pages: 22
DOI: 10.4018/978-1-5225-5204-8.ch047
(Individual Chapters)
No Current Special Offers


Recently, computer vision is playing an important role in many essential human-computer interactive applications, these applications are subject to a “real-time” constraint, and therefore it requires a fast and reliable computational system. Edge Detection is the most used approach for segmenting images based on changes in intensity. There are various kernels used to perform edge detection, such as: Sobel, Robert, and Prewitt, upon which, the most commonly used is Sobel. In this research a novel type of operator cells that perform addition is introduced to achieve computational acceleration. The novel operator cells have been employed in the chosen FPGA Zedboard which is well-suited for real-time image and video processing. Accelerating the Sobel edge detection technique is exploited using different tools such as the High-Level Synthesis tools provided by Vivado. This enhancement shows a significant improvement as it decreases the computational time by 26% compared to the conventional adder cells.
Chapter Preview


Nowadays computer vision, which includes acquiring, processing, analyzing, and understanding of digital images to attain certain numerical information from the captured frames, is a weighty center of attraction for a lot of researchers; one of the reasons is that human-computer interaction has become an increasingly important part of our daily lives. It expanded from desktop applications to include games, learning and education, commerce, health and medical applications, emergency planning and response, and systems that support collaboration and community. Computer vision is currently being used in many applications such as: augmented reality, automated optical inspection, automatic number plate recognition, etc. With the remarkable elaboration of information technology in our society, it is expected that computer vision systems will be embedded in many aspects of the environment.

Computer vision is a challenging operation due to its strict time requirements in order to be able to process in real time because most of the computer vision applications require immediate response within tight time frames. Real-time systems are widely used for human-computer interactive applications. Therefore, it must meet real-world requirements and consequently be able to control and respond effectively to them. Moreover, a real-time application program is an application program that should function and produce an output within a time frame that the user senses as immediate or current. Computational time must be less than a defined value, usually measured in a few milliseconds depending on the throughput rate and the frame sizes.

When building any system, the main objective is to achieve its required functionality with the highest achievable performance levels. However, there is no optimal solution; therefore any solution is a trade-off between power, time, silicon area, accuracy and many other less critical factors. Recently, computing algorithms have no longer been able to boost performance by continuously escalating the clock speed of the processors they run on. This prompted using processors with thousands of cores on them, which were principally designed for highly parallel operations.

Real-time processing -especially video processing- requires high computational demands along with the requirements of low power and low cost. In the recent years, many approaches have been devoted to develop architectures that could tackle these demands. Application-Specific Integrated Circuits (ASICs) yield the best performance regarding the computational throughput at low power consumption, on the other hand ASICs lack flexibility and also require high development time (Saponara, Casula, & Fanicci, 2008). The next thought would be the general purpose Digital Signal Processors (DSPs), which grant proficient programmability. However, the higher the complexity of the system, the worse its performance gets (Kumar, et al., 2015).

In the recent years, Graphics Processing Units (GPUs) have taken over the field of multimedia processing due to its massively parallel architecture (Mccool, 2008), but at the same time, GPUs are not able to exploit low-level parallelism and they also occupy bigger silicon area compared to ASICs.

Moreover, the insatiable demand for high-speed and accurate response requires a corresponding enhancement in computation speed and performance. That is why FPGAs are used in the presented work. FPGA devices boost abundant resources with which components that are responsible for accelerating signal, image and data processing can be provided, because of its massively parallel structure and also because currently developed FPGAs offer better performance regarding the computational speed, resource capacity and power consumption.

In order to accelerate a specific algorithm on an FPGA board, High Level Synthesis (HLS) is required, which is an automated hardware design process which interprets the algorithmic description of a desired behavior and creates its digital hardware design. The first step in the synthesis process is defining the high-level specification of the problem, at which: the code is analyzed, architecturally constrained, and scheduled to create Register-Transfer Level (RTL) HDL. The goal of HLS is to give a better control over optimization of the design architecture, and to accelerate the IP creation by enabling C, C++ and System-C specifications such that they can be targeted into Xilinx All Programmable devices without the need to manually create RTL.

Complete Chapter List

Search this Book: