Article Preview
TopIntroduction
On-chip multiprocessing has reached mainstream in server, desktop and laptop computers, and is being increasingly applied to embedded systems. As 45nm and 32nm process technologies mature, we approach the mark of 2 billion transistors on a single silicon die, allowing for the on-chip integration of dozens of processing cores. This is an increase by one order of magnitude, if compared with processors that are commercially available. As the number of cores per chip increases, on-chip communication plays an increasingly important role on system performance, power dissipation, chip area, design effort and total cost. Since early solutions based on point-to-point interconnects or on-chip busses don’t scale and quickly become a communication bottleneck (Furber & Bainbridge, 2005; Henkel, Wolf, & Chakradhar, 2004), Networks-on-Chip (NoCs) have emerged as a suitable architectural template for on-chip communication (Benini & De Micheli, 2002; Dally & Towles, 2001). Such template offers a generic communication platform which can be parameterised and reused for a large number of SoC designs.
As an integration platform, the NoC should have the capability to provide different level of services for various applications on the same network (Benini & De Micheli, 2001). Some applications have very stringent communication service requirements; the correctness relies on not only the communication result but also the completion time bound. A data packet received by a destination too late could be useless. These critical communications are called real-time communications. For a packet transmitted over the network, this time bound is denoted by the packet network latency. The worst case acceptable time metric is defined to be the deadline of the packet. A traffic-flow is a packet stream which traverses the same route from the source to the destination and requires the same grade of service along the path. For hard real-time traffic-flows, it is necessary that all the packets generated by the traffic-flow must be delivered before their deadlines even under worst case scenarios. A set of real-time traffic flows over the network are termed schedulable if all the packets belonging to these traffic flows meet their deadlines under any arrival order of the packet set.
As a popular switching control technique, wormhole switching (Ni & McKinley, 1993) has been widely applied for on-chip networks due to its high throughput and smaller buffering requirements (Kavaldjiev & Smit, 2003). However, the analysis of the real-time packet schedulability for wormhole switching networks is still an open problem. Predictable behaviour of the network services is essential to support real time requirements – particularly satisfying deadline bounds. But the situation for on-chip wormhole networks is partially non-deterministic due to the contentions in communication. In on-chip networks, several tasks running on different nodes exchange information periodically. During a transmission period, each transmitted packet shares the resources, such as buffers or physical links, with other packets. When several packets try to access the same resource at the same time, contention occurs and the network can only serve one packet and must suspend the others based on some arbitration policy. Once a packet becomes blocked, it can block other packets, which can in turn block other packets, and so on. The exact analysis of congestion in this situation is hard due to the possibility of a packet becoming blocked at several routers during its journey from source to destination. The contention problem leads to packet delays and even missed deadlines. Therefore, it is necessary to give an arbitration strategy and analysis approach to predict whether all the real-time packets can meet their timing requirements.