The Consistence of Checkpointing and Rollback Recovery Scheme

The Consistence of Checkpointing and Rollback Recovery Scheme

Zhenpeng Xu (Jiangsu Automation Research Institute, Lianyungang, China), Hairong Chen (Jiangsu Automation Research Institute, Lianyungang, China) and Weini Zeng (Jiangsu Automation Research Institute, Lianyungang, China)
DOI: 10.4018/IJAPUC.2015100101


For the traditional distributed computing system, the related message logging conditions had specified to keep the state consistence among the distributed process. Since many new characteristics are introduced in the mobile computing system, the new sufficient logging requirement for mobile computing has to be specified, to avoid the possible state inconsistence among the mobile nodes and the static nodes during the rollback recovery. Firstly, the related definitions of the inconsistence among the process state and the nondeterministic event were extended in the paper, independent of the specific log-based fault tolerant scheme. Finally, a novel particular logging consistency condition was derived based on the extended definitions and Piece-Wise Deterministic model for the mobile computing system. By contrast, the proposal is a practical and efficient constraint for mobile computing upon the possible failures.
Article Preview

1. Introduction

For the fault tolerance of mobile computing, checkpointing and rollback recovery are well-known backward error recovery techniques to minimize loss of computation in the presence of process faults (Kuang et al., 2014; Meroufel et al., 2014). Basically, the transparent fault tolerant schemes that do not require user interaction can be classified into two categories: checkpoint-based and log-based rollback recovery scheme (Islam et al., 2014; Mendizabal et al., 2014; Awasthi et al., 2014). In the log-based rollback recovery scheme, each process typically records both the content and delivering causal relations of all the messages it has delivered into a location (called a message log) that will survive the failure of the process (Chen et al., 2005). Each of the saved state of the process is called a checkpoint, to reduce the number of event logs to be replayed during the recovery phase (Elnozahy et al., 2002). Upon a process failure, there is a rollback recovery mechanism which brings the failure process back to normal operation, through replaying message logs, starting with the reloaded checkpoint (Elnozahy et al., 2002). Commonly, the log-based rollback recovery schemes require that once the set of the failure process has recovered, the related states have to be consistent with the states of the other failure-free processes (Chen et al., 2005; Elnozahy et al., 2002). This consistency requirement is usually expressed in terms of orphan process, whose state is inconsistent with the recovered state of the other process (Alvisi et al., 1998).

For the traditional wired distributed computing system, there are already two orphan-free consistent conditions proposed for the consistent recovery, No-Orphans Consistency Condition (NOCC) (Alvisi et al., 1998) and Orphan-free Consistency Condition (OCC) (Xu et al., 2013). Due to the topology differences between the traditional distributed computing system and the mobile computing system, both NOCC and OCC are not suitable for the mobile computing system as the related events of the mobile computing stations are not considered. PCRD is described in the form of the state interval in general for mobile computing (Park ekt al., 2002). However, it may still lead to the orphan inconsistent recovery, when the rollback propagation of the failure-free process is involved in the recovery, since the definition only specifies the lost state interval of the failure process. Furthermore, PCRD does not specify the particular message log requirement for an orphan-free consistent recovery process (Park et al., 2002).

Many new characteristics are introduced in mobile computing, such as mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage (Park et al., 2003; Gupta et al., 2008). Therefore, the wireless network connection is more fragile and mobile host is much less reliable than the traditional wired distributed computing. Mobile hosts may disconnect from the rest of the network due to doze mode, abrupt power off or permanents damage. Therefore, it is more desirable for mobile computing to be equipped with an appropriate rollback recovery scheme to minimize the loss of computation due to the process fault. Research on rollback recovery fault tolerant scheme for mobile computing systems has received tremendous interests in recent years. Various schemes have been presented to accommodate the characteristics of mobile computing (Agbaria et al., 2004; Brzezinsk et al., 2006; Li et al., 2005).

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing