A Fault-Tolerant Scheduling Algorithm Based on Checkpointing and Redundancy for Distributed Real-Time Systems

A Fault-Tolerant Scheduling Algorithm Based on Checkpointing and Redundancy for Distributed Real-Time Systems

Barkahoum Kada (Department of Computer Science, University of Batna2, Batna, Algeria) and Hamoudi Kalla (Department of Computer Science, University of Batna2, Batna, Algeria)
Copyright: © 2019 |Pages: 18
DOI: 10.4018/IJDST.2019070104

Abstract

Real-time systems are becoming ever more widely used in life-critical applications, and the need for fault-tolerant scheduling can only grow in the years ahead. This article presents a novel fault tolerance approach for tolerating transient faults in hard real-time systems. The proposed approach combines both checkpointing with rollback and active replication to tolerate several transient faults. Based on this approach, a new static fault-tolerant scheduling algorithm SFTS is presented. It is based on a list of scheduling heuristics which satisfy the application time constraints even in the presence of faults by exploring the spare capacity of available processors in the architecture. Simulation results show the performance and effectiveness of the proposed approach compared to other fault-tolerant approaches. The results reveal that in the presence of multiple transient faults, the average timing overhead of this approach is lower than checkpointing technique. Moreover, the proposed algorithm SFTS achieves better feasibility rate in the presence of multiple transient faults.
Article Preview
Top

Literature Review

Extensive research has been presented to investigate the software-based fault tolerance techniques against transient faults. In the software replication technique (Girault et al., 2004; Assayad et al., 2012; Samal et al., 2013; Meroufel & Belalem, 2014) multiple replicas (active or passive) of each task are executed on different processors.

Assayad et al. (2012) proposed a new tri-criteria scheduling heuristic to minimize the schedule length, the global system failure rate and the power consumption of the generated schedule. Active replication of tasks and data dependencies is used to increase the system reliability. The primary-backup approach (passive replication) is used as a fault-tolerant scheduling technique in (Samal et al., 2013) to guarantee real time tasks constraints in the presence of permanent or transient faults. The authors proposed fault-tolerant scheduling for independent tasks using a hybrid genetic algorithm.

The replication technique is effective to tolerate spatial multiple faults (permanent or transient) and it is more preferable for safety-critical systems (Ejlali et al., 2012). However, scheduling multiple replicas of each task on different processors may not be affordable due to cost constraints (Ropars et al., 2015).

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 11: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing