1.1. HAS Architecture
Inside conventional computing architecture, only one CPU, or a multi-core CPU handles all operations. To execute massive and high speed operations, lots of CPUs are required, resulting in increasing hardware cost and electric power consumption. The benefit of Heterogeneous System Architecture (HSA) that integrates CPUs and GPUs is that it selects operations of different properties inside an algorithm and sends them to CPUs or GPUs with better hardware architecture for these operations. Thus, we can not only implement optimal hardware architecture for specific algorithms but also execute operations in GPUs and CPUs in parallel to accelerate computations
1.2. Hardware Architecture of GPU
GPUs originally are hardware designed to handle graphics rendering, but in recent years GPU manufacturing companies, such as AMD and NVIDIA, start developing techniques to utilize the large amount of cores inside GPUs for computation. In this research, we adopt Kepler micro architecture developed by NVIDIA, in which the core unit is called as the Streaming Multiprocessor (SMX). Each SMX consists of 192 CUDA cores. As shown in Figure 1, each CUDA core can be treated as a thread processing unit. Each GPU chip has more than one SMX core, which explains the advantage of GPU for parallel computing. But since GPUs are limited by adopting Single instruction, multiple data (SIMD) (Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V,2008; Nickolls, J., Buck, I., Garland, M., & Skadron, K,2008) architecture, they are suitable for handling operations of algorithms with data of high independence. For operations of algorithms with data of high dependence amid one another, the performance of GPUs may be lower than that of CPUs.