A Network-On-Chip is a paradigm that tackles limitations of traditional bus-based interconnects. It allows complex applications that demand many resources to be deployed on many-core platforms effectively. To satisfy requirements on dependability, however, a NoC platform requires dynamic monitoring and reconfiguration mechanisms. In this chapter, the authors propose an agent-based management system that monitors the state of the platform and applies various reconfiguration techniques. These techniques aim at enabling uninterruptable execution of applications satisfying dependability requirements. The authors develop the proposed system within Event-B that provides a means for stepwise and correct-by-construction specification supported by mathematical proofs. Furthermore, the authors show the mechanism of decomposition of Event-B specifications such that a well-structured and hierarchical agent-based management system is derived.
TopIntroduction
Modern applications are complex and demand a large number of resources for their intensive computations. Furthermore, the computations have to be carried out in an efficient manner. To fulfil these demands, Network-on-Chip (NoC) has been proposed as an efficient and scalable interconnect paradigm for many-core platforms (Benini & De Micheli, 2002). The cores of a platform are interconnected with one structured net so that they can exchange data efficiently. However, the computations may be interrupted by various faults such as faults caused by high temperature of cores or hardware faults that occur in cores. This may lead to inappropriate results, especially for the applications from critical domains such as biomedical (Khatib, et al., 2006) or aerospace (Motamedi, Ionnides, Rümmeli, & Schagaev, 2009). To allow an application to complete its computations without interruption (i.e., produce the expected result), it is necessary to provide run-time means for monitoring the state of platform and reacting appropriately on changes.
These means are usually implemented in the form of agents (Rantala, Isoaho, & Tenhunen, 2007). The agents monitor the state of the platform and react to non-desired changes caused by faults by performing different procedures to tolerate these faults. Generally, the bigger the NoC platform the larger the number of agents it requires. In order for the system to manage a large number of agents, they are organised into a multi-level hierarchy. The hierarchy typically has a three-level structure for centralised schemes (Yin, et al., 2009) allowing agents to exchange the information about the state of the NoC platform efficiently and effectively.
The dependability of the system should also be maintained at a high level since inadequate behaviour of agents can cause dramatic effects. In other words, the proposed system has to be correct w.r.t. the requirements. One of the appropriate approaches for specifying reliable systems, as well as verifying their design is provided by formal methods. Formal development of NoC platforms and their agent-based monitoring systems allows us to prove their correctness mathematically by employing a stepwise and correct-by-construction approach. In this paper, we stepwise develop a specification of a hierarchical agent-based monitoring system for NoC platforms using the Event-B formalism (Abrial, 2010). Furthermore, we introduce various dynamic reconfiguration procedures into the specification on local (regional) and global (inter-regional) levels enabling effective tolerance of faults occurring in the platform. Event-B enables modelling a discrete transition system and proving the correctness of a model following the refinement-based approach. Moreover, it has a mature tool support via the Rodin platform (RODIN, 2011).
The rest of the paper is organised as follows. In the next section, we consider an agent-based management system for a NoC platform and briefly describe possible reconfiguration of the platform in case of faults. The section “The Event-B formalism” explains the essential parts of the Event-B framework used to specify the system. Furthermore, we show proof obligations that allow us to prove correctness of the specification. In addition, we describe possible decompositions of a model, which help us to cope with the model complexity. In the “Proposed approach” section, we describe the methodological steps for deriving correct agent-based systems. In the “Modelling an agent-based monitoring system for NoC platforms in Event-B” section, we develop the specification of the proposed system within the Event-B formalism. Finally, in the “Related work” section, we elaborate on related work and in the “Conclusion” section, we conclude this paper.