Article Preview
Top1. Introduction
Network-on-chip (NoC) has emerged as an efficient architecture to manage communication in a system on chip (SoC), where a large number of components and storage blocks are integrated on a single chip. This intensification of communications leads to important questions such as performance and energy consumption. Decreasing the transistor size has made semiconductors more sensitive to faults. Thus, the challenge is, to maintain the system functionality during its operational lifetime and ensure that the system performance is preserved. For this reason, researchers have attached a great deal of importance to the reliability in networks on chip. Faults which may occur in networks on chip can be divided into two main categories: permanent faults (or hard faults), and temporary faults (or soft faults) (Radetzki et al., 2013). The soft faults are classified into transient and intermittent. These three types of faults (permanent, transient and intermittent) can be caused by several internal or external factors. The majority of failures (80%) are caused by transient faults, whilst the rest of them originate mainly in permanent and intermittent faults (Lehtonen et al., 2007).
Faults in different components of the NoC have different causes, however, all can result in serious consequences such as loss of packet data, misrouting, deadlocks, and malfunctions. It follows that, the reliability of communication becomes an attracting challenge when designing the NoC. However, communication has a huge impact on the performance of the network on chip and its more desirable to design efficient algorithms to ensure that. Fault in routers or links is the major problem that causes the failure on transmitting the packet in a NoC. Communication performance of a NoC is depended highly on the routing algorithm. Routing algorithm determines the path that each packet follows between source and destination node.
The fault tolerance routing algorithm is the process of finding a new path to steer packets from sources to destinations in a faulty network, by choosing an optimal path, the routing algorithm can efficiently increase the performance of the network. Congestion is another key factor which leads to increase the transmission delay and power consumption. For this, routing algorithms can improve performance by re-routing packets through less congested regions and distributing traffic over the network. Finally, failures and congestions should be managed in an effective way to ensure availability and robustness into the network on the chip.
In this context, many fault tolerance routing algorithms have been proposed for critical applications. It is, therefore, essential to handle failure, ensure correct and continuous operation of the circuit in its environment, even when the failure rate is high.
To achieve these objectives, we propose a new approach in the domain of fault- tolerant NoC with two main contributions. Firstly, we propose a unified fault model that includes transient faults, permanent faults, and the congestion considered as a fault. Secondly, we present a new architecture based on sub-nets. This architecture is able to achieve low latency and increase the network bandwidth. Additionally, our architecture is capable of handling multiple link and router failures up to 40% and neither utilizes any VC (Virtual Channels). Finally, the solution is deadlock-free and congestion-aware.
We describe in this paper an interesting solution jointly to the congestion management and fault tolerance in NoC called DINRA. The new architecture offers many advantages like reducing latency and the use of alternative paths to route packets in case of faulty links or/and routers. The rest of the paper is organized as follows. Section 2 gives a brief overview of the work. The new architecture is presented in the third section. Implementation details of the proposed solution are given in section 4. In Section 5, DINRA -FTNoC is evaluated. Our conclusions are drawn in the final section.