Performance Optimization of Tridiagonal Matrix Algorithm [TDMA] on Multicore Architectures: Computational Framework and Mathematical Modelling

Performance Optimization of Tridiagonal Matrix Algorithm [TDMA] on Multicore Architectures: Computational Framework and Mathematical Modelling

Anishchandran Chathalingath, Arun Manoharan
Copyright: © 2019 |Pages: 12
DOI: 10.4018/IJGHPC.2019100101
(Individual Articles)
No Current Special Offers


Fast and efficient tridiagonal solvers are highly appreciated in scientific and engineering domain, but challenging optimization task for computer engineers. The state-of-the-art developments in multi-core computing paves the way to meet this challenge to an extent. The technical advances in multi-core computing provide opportunities to exploit lower levels of parallelism and concurrency for inherently sequential algorithms. In this article, the authors present an optimal performance pipelined parallel variant of the conventional Tridiagonal Matrix Algorithm (TDMA), aka the Thomas algorithm, on a multi-core CPU platform. The implementation, analysis and performance comparison of the proposed pipelined parallel TDMA and the conventional version are performed on an Intel SIMD multi-core architecture. The results are compared in terms of elapsed time, speedup, cache miss rate. For a system of ‘n' linear equations where n = 2^36 in presented pipelined parallel TDMA achieves speedup of 1.294X with a parallel efficiency of 43% initially and inclines towards linear speed up as the system grows.
Article Preview


Tridiagonal system solvers have been accredited as vital building block in many of the scientific and engineering applications, especially when dealing with problems in Dynamics (Zhou, 2017; Sakharnykh, 2009). Although there are many parallel and sequential algorithms available for solving tridiagonal system, a high performance, architecture independent performance optimal tridiagonal solver library is still challenging task. The sparseness with low operational intensity and parallel computational complexity of tridiagonal matrix demands custom hardware like field programmable gate array (FPGA) or graphical processing unit (GPU) for developing high performance solver.

Many efforts were carried out to develop performance optimal tridiagonal solver via software optimization, custom hardware platforms and hybrid parallel algorithms (Chang, 2014). For solving discretized Poisson equation on a rectangular domain, Hockney et al. developed Cyclic Reduction (CR) algorithm. The CR method performs odd-even reduction in two phase namely forward reduction & backward substitution that can be done in parallel. The authors also proposed a more efficient parallel version of CR where it performs only forward reduction called Parallel Cyclic Reduction (Hockney, 1981). Zhang et al did a comprehensive analysis about the three major tridiagonal solving algorithms namely CR, PCR and Recursive Doubling (RD) and proposed hybrid models of those algorithms. The authors made a comparative performance study with GPU implementation (Zhang et al., 2010). For fluid simulation, Sakharnykh has presented PCR-pThomas algorithm on GPU platform succeeding thread level Thomas algorithm (Sakharnykh, 2009). Wang et al. optimized PCR-pThomas algorithm on Intel MIC architecture by efficient exploitation of registers and in-core vectorization feature (Wang et al., 2016). The authors then introduced an improved variant namely register-PCR-half-pTthomas algorithm that boasts minimal computational cost and memory utilization (Wang et al., 2016). Tang et al proposed hybrid parallel tridiagonal solver that combines CR and partition method, an approach based on divide and conquer technique (Tang et al., 2014). While Quesada-Barriuso et al. compares and contrasts the parallel efficiency of fine-grained CR algorithm towards coarse grained Bondelli algorithm on both multicore CPU architectures and GPU platforms (Barriuso et al., 2011). It is explicit that the huge computational complexity of parallel high-performance tridiagonal solvers, demands for immense computational power which is justified by coprocessors like GPU, FPGA, MIC, etc.

Complete Article List

Search this Journal:
Volume 16: 1 Issue (2024)
Volume 15: 2 Issues (2023)
Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing