A New Fault Tolerant Routing Algorithm for Networks on Chip

A New Fault Tolerant Routing Algorithm for Networks on Chip

Chakib Nehnouh (University of Oran1 Ahmed Ben Bella, Oran, Algeria) and Mohamed Senouci (University of Oran1 Ahmed Ben Bella, Oran, Algeria)
DOI: 10.4018/IJERTCS.2019070105


To provide correct data transmission and to handle the communication requirements, the routing algorithm should find a new path to steer packets from the source to the destination in a faulty network. Many solutions have been proposed to overcome faults in network-on-chips (NoCs). This article introduces a new fault-tolerant routing algorithm, to tolerate permanent and transient faults in NoCs. This solution called DINRA can satisfy simultaneously congestion avoidance and fault tolerance. In this work, a novel approach inspired by Catnap is proposed for NoCs using local and global congestion detection mechanisms with a hierarchical sub-network architecture. The evaluation (on reliability, latency and throughput) shows the effectiveness of this approach to improve the NoC performances compared to state of art. In addition, with the test module and fault register integrated in the basic architecture, the routers are able to detect faults dynamically and re-route packets to fault-free and congestion-free zones.
Article Preview

1. Introduction

Network-on-chip (NoC) has emerged as an efficient architecture to manage communication in a system on chip (SoC), where a large number of components and storage blocks are integrated on a single chip. This intensification of communications leads to important questions such as performance and energy consumption. Decreasing the transistor size has made semiconductors more sensitive to faults. Thus, the challenge is, to maintain the system functionality during its operational lifetime and ensure that the system performance is preserved. For this reason, researchers have attached a great deal of importance to the reliability in networks on chip. Faults which may occur in networks on chip can be divided into two main categories: permanent faults (or hard faults), and temporary faults (or soft faults) (Radetzki et al., 2013). The soft faults are classified into transient and intermittent. These three types of faults (permanent, transient and intermittent) can be caused by several internal or external factors. The majority of failures (80%) are caused by transient faults, whilst the rest of them originate mainly in permanent and intermittent faults (Lehtonen et al., 2007).

Faults in different components of the NoC have different causes, however, all can result in serious consequences such as loss of packet data, misrouting, deadlocks, and malfunctions. It follows that, the reliability of communication becomes an attracting challenge when designing the NoC. However, communication has a huge impact on the performance of the network on chip and its more desirable to design efficient algorithms to ensure that. Fault in routers or links is the major problem that causes the failure on transmitting the packet in a NoC. Communication performance of a NoC is depended highly on the routing algorithm. Routing algorithm determines the path that each packet follows between source and destination node.

The fault tolerance routing algorithm is the process of finding a new path to steer packets from sources to destinations in a faulty network, by choosing an optimal path, the routing algorithm can efficiently increase the performance of the network. Congestion is another key factor which leads to increase the transmission delay and power consumption. For this, routing algorithms can improve performance by re-routing packets through less congested regions and distributing traffic over the network. Finally, failures and congestions should be managed in an effective way to ensure availability and robustness into the network on the chip.

In this context, many fault tolerance routing algorithms have been proposed for critical applications. It is, therefore, essential to handle failure, ensure correct and continuous operation of the circuit in its environment, even when the failure rate is high.

To achieve these objectives, we propose a new approach in the domain of fault- tolerant NoC with two main contributions. Firstly, we propose a unified fault model that includes transient faults, permanent faults, and the congestion considered as a fault. Secondly, we present a new architecture based on sub-nets. This architecture is able to achieve low latency and increase the network bandwidth. Additionally, our architecture is capable of handling multiple link and router failures up to 40% and neither utilizes any VC (Virtual Channels). Finally, the solution is deadlock-free and congestion-aware.

We describe in this paper an interesting solution jointly to the congestion management and fault tolerance in NoC called DINRA. The new architecture offers many advantages like reducing latency and the use of alternative paths to route packets in case of faulty links or/and routers. The rest of the paper is organized as follows. Section 2 gives a brief overview of the work. The new architecture is presented in the third section. Implementation details of the proposed solution are given in section 4. In Section 5, DINRA -FTNoC is evaluated. Our conclusions are drawn in the final section.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 2 Issues (2018)
Volume 8: 2 Issues (2017)
Volume 7: 2 Issues (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing