Article Preview
TopIntroduction
The integration of multiple functionalities on a single hardware platform is an increasing trend in the design of embedded systems with the consideration of reducing cost. While different functional tasks running on these systems share resources, they do not share the same criticality. The concept of mixed-criticality, which has been identified as one of the core foundational concepts in the emerging disciplines of Cyber Physical Systems, has risen. A mixed-critical system is an integrated suite of hardware, operating system and middleware services and application software that supports the execution of safety-critical, mission-critical, and non-critical computer tasks within a single, secure computing platform (Barhorst et al., 2009). Each safety-critical task in these systems is characterized by a level of assurance against failure, and such a level is defined in two or more distinct safety levels. For example, IEC 61508 defining four functional safety integrity levels is an international standard, published by the International Electro-technical Commission of rules applied in industry.
In practice, a lot of safety-critical embedded systems are real-time systems because safety-critical tasks usually cannot be delayed for their execution. A correct execution of such a task needs to be done before a deadline or otherwise the consequence can be catastrophic. In real-time systems, the estimation of a task's Worst Case Execution Time (WCET) plays an important role in scheduling. When a mixed-criticality system is developed, the system designers estimate and define the WCETs as a parameter of the real-time tasks. Then, the system is validated by ensuring that the critical real-time tasks, including safety-critical and functionality-critical tasks, are schedulable. Some of the systems, such as civilian and defense avionics, are subject to mandatory certification requirements by statutory organizations (Baruah et al., 2012). These systems must be certified after the design and before their implementation. In the certification process, Certification Authorities (CA's) tend to be very concerned about the safety requirement. They may be more conservative in estimating the WCETs of the safety-critical tasks, using longer WCETs when certifying the systems. The difference between the two different estimations brings to light some of the new, interesting real-time scheduling problems. It is well-known that conventional scheduling methods cannot satisfactorily address these problems.
In a mixed-criticality system, computation quality is also clearly important. Faults or errors may happen during a task's execution which can either produce incorrect results or cause critical tasks to miss deadlines. There are two types of faults classified for happening on computer systems, permanent or transient. Permanent means faults that cannot be recovered, such as hardware damage and shutdown. Transient faults, by contrast, can be recovered after the fault is gone. A common example of transient fault is the inducing in memory cells of spurious values, caused by charged particles (e.g., alpha particles) passing through them (Krishna, 2014). In computer system transient faults occur much more frequently than permanent faults do (Castillo et al., 1982; Iyer et al., 1986). Transient faults can be tolerated by adding redundancy where a task will be re-executed if it completes with errors. In a real-time system, the redundancy is considered as a time redundancy where a re-execution is done by using slack. Slack is defined as time between when tasks finish and their respective deadlines. If the length of a slack is sufficiently long, a re-execution of the task can be run by exploiting the slack. A system that faults can be recovered generally is called a fault-tolerant system.