Trans_Proc: A Reconfigurable Processor to Implement The Linear Transformations

Trans_Proc: A Reconfigurable Processor to Implement The Linear Transformations

Atri Sanyal, Amitabha Sinha
Copyright: © 2022 |Pages: 16
DOI: 10.4018/IJSI.303575
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A reconfigurable transform processor is proposed and implemented here. Firstly, a brief study of processors implementing different transformations is presented. We have categorized the transform processor as the one which can implement a number of linear transforms using reconfigurability. The theoretical suitability regarding the architecture of the processor is proved by graph theory method. Then the complete architecture of the overall processor and the processing element is presented and implemented using VHDL. The complete instruction set suitable to the processor is designed. The instructions are mapped to the sequence of control signals. Generating sequence of control signals for every type of instructions would finally create a hardwired control unit for the processor which was also presented. Next the processor is fed with data to simulate it. A three-phase simulation is carried out to prove the correctness of the design. Finally the same processor with a data bus width of 32 to 512 is implemented and compared in terms of speed and size.
Article Preview
Top

1. Introduction

In this paper we have proposed an efficient architecture for implementation of frequently used and computationally intensive linear transformations in signal or image processing. The linear transformations like Fast Fourier Transform, Fast Discrete Cosine Transform or Fast Discrete Wavelate Transform are computationally intensive and critical for the processing applications. The papers proposing different designs in this domain are mainly of three types. The first category of papers propose architectures to implement only a single category of linear transformations like Fast Fourier Transform as discussed in Konstantines et al. (2003), Sarada V et al. (2013), Sharon et al.(2013), Srivastava et al.(2014) and Wadekar et al (2015) or Fast Discreet Cosine Transform which was implemented in papers like Tseng et al. (2004,2005). Discreet Wavelate Transform was implemented in papers like Shahbharami Ashadollah et al.(2009) and in Petrovsky et al(2013). Discrete Hilbart Transform was implemented in Reddy et al.(2014). Since these implementation’s primary focus is on speed so they are mainly implemented on ASIC. A study on these papers have shown a variety of algorithms to decrease the number of computationally intensive operations. We have seen multiplier less variety, high speed pipeline, data forwarding, step lifting techniques implementing FFT or FDCT algorithms which greatly decrease the computational complexity and increase the speed. We have also seen reconfigurable data path, Radix 2x2 butterfly element and other reconfigurable architecture with reduced area and power. The second category of papers propose processors or architectures which can implement a number of general linear transformations like FFT,FDCT, FDWT in the same architecture . These architectures include basic building blocks common to all these transformations and they need to reconfigure itself before executing different transformations. So, they are mainly implemented using reconfigurable architectures like FPGA as seen in papers like K Joe Hass et al(1998), Sinha et al(2013) and in Sanyal et al(2010,2012,2018). Our paper proposes a processor of that category. The third category of papers discuss implementation of more generic image/signal applications as seen in the papers like Vikram K.N et al (2006), Rossi et al.(2010) and Purohit S et al (2013). While describing a linear transform flow graph method is used extensively in different literatures. It was proved earlier in Sanyal et al(2020 a,b) by graph theoretical and mathematical induction method that a MIMD processor consisting of processing elements connected like a completely connected equi- vertex bi partite graph can copy any actions shown in the flow graph of transformations like FFT,FDCT,FDWT etc of any arbitrary size. This confirms that a processor with such type of architecture can execute the transforms represented using flow graph method. The architecture of processing element and the overall architecture discussed in Sanyal et al (2020 a,b)is described thoroughly here. The architecture of control unit is presented here. The data exchange procedure between the main CPU, Local Image memory and this processor and its local memory is discussed in detail here. The instruction set for processing element and the overall processor are described along with their corresponding control lines. The representative examples of each category of the instruction set are considered and the step wise control signal to implement them is discussed. The entire architecture requires reconfigurability as it can implement several transforms by its own. The architecture size is noted by changing the data width of the processor in 32 bits, 64 bits, 128 bits, 256 bits and 512 bits. Then the architecture is coded in VHDL, synthesized, and simulated using Xilinx Vivado. The processor is simulated to verify the operations in three stages. First the component inside the processing element (floating point adder and multiplier) is simulated and tested. Then the longest sequence of execution required in Loefflers FDCT algorithm is tested for every processing element and finally the testing of the overall architecture and the data routing between different processing element is simulated and tested. The synthesis result showing the size of the architecture in LUT level and the synthesis result of power and time are discussed. The rest of the paper is composed in this way, section 2 discusses the theoretical background of the architecture, section 3 discusses the implementation of the processor in a modular way, the overall architecture of the processor and the implementable CU are presented, then the processing element level architecture is presented, instruction set and the control signals implementing some representative examples of the instruction set is shown. The data exchange procedure between the Local RAM, buffer registers and the Trans_Proc is discussed. Section 4 discusses the step by step synthesis and the simulation results in terms of speed, timing and size. Finally section 5 discusses the conclusion and future scope of the work.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing