Improved Checkpoint Using the Effective Management of I/O in a Cloud Environment

Improved Checkpoint Using the Effective Management of I/O in a Cloud Environment

DOI: 10.4018/978-1-5225-7598-6.ch016
(Individual Chapters)
No Current Special Offers


One of the most important points for more effective use in the environment of cloud is undoubtedly the study of reliability and robustness of services related to this environment. In this case, fault tolerance is necessary to ensure that reliability and reduce the SLA violation. Checkpointing is a popular fault tolerance technique in large-scale systems. However, its major disadvantage is the overhead caused by the storage time of checkpointing files, which increases the execution time and minimizes the possibility to meet the desired deadlines. In this chapter, the authors propose a checkpointing strategy with lightweight storage. The storage is provided by creating a virtual topology VRbIO and the use of an intelligent and fault tolerant I/O technique CSDS (collective and selective data sieving). The proposal is executed by active and reactive agents and it solves many problems of checkpointing with standard I/O. To evaluate the approach, the authors compare it with a checkpointing with ROMIO as I/O strategy. Experimental results show the effectiveness and reliability of the proposed approach.
Chapter Preview


The emergence of cloud computing has brought a new dimension to the world of information technology. Although cloud computing offers several advantages such as virtualization, cost reduction, multi-tenancy, etc., there are risks and failures associated with it (Yang et al., 2014). A key challenge for research in cloud computing is to ensure the reliability of the system without reducing the overall system performance. Among of fault tolerance, there is the strategy of checkpointing. The major problem of checkpointing is the overhead caused by the storage time of checkpointing files in stable storage, this time is estimated at 70% of checkpointing process time caused by the storage (Ouyang et al., 2009a;Cornwell &kongmunvattana, 2011a), Figure1 shows the main phases of the checkpointing process. This process is based on three phases: i) suspend communication between processes and ensure consistent state; ii) use the checkpointing library to create and store checkpoints; iii)re-connect processes and continue execution.

Figure 1.

The time of the phases of the checkpointing process


The aim of our work is to minimize the overhead of checkpointing by minimizing its storage time. To ensure this goal, we improve the I/O management and we propose a checkpointing strategy of three phases:

  • 1.

    The construction of VRbIO topology (Virtual RbIO): RbIO proposed in (Lui et al., 2010) is a virtual hierarchical topology; it minimizes checkpointing time and I/O time at the same time. In our system, each VM has a reactive agent responsible of the local I/O management; at the end of this phase some of these reactive agents will be activated to manage the I/O of a group of VMs of the server. In this case, the I/O will be hierarchical.

  • 2.

    Creating the checkpointing files using coordinated checkpointing protocol.

  • 3.

    Ensuring a lightweight and fault-tolerant storage of these files by using Collective and Selective Data Sieving input/output (CSDS I/O), which is executed by only the active agents. CSDS is an improved ROMIO I/O strategy. However, this strategy has several problems and limitations (Fu et al., 2011).

Our algorithm with its three phases provides solutions for most issues raised by the use of classical checkpointing with ROMIO as an I/O strategy. The rest of the chapter is organized as follows: Section 2 presents the background in the field of aggregating I/O techniques with a comparative study. ROMIO and its features are illustrated in Section 3. Section 4 presents our contribution, each service of this contribution is described in details, and all the problems cited in previous section are solved in this section. Section 5 presents some experimental results, followed by a conclusion and future research directions.



An important reason for the limitations of I/O systems is that applications often send smaller queries disjoint. This access mode generates a first additional cost to the large number of applications running on various transmission channels, but more significantly increases the processing time of the latter (Sadiku et al., 2014). To deal with this problem, several “aggregation” methods have been proposed we can distinguish two types of aggregations strategies: dependent and collective.

Independent I/O is a straightforward form of I/O and is widely used in parallel applications. This form of I/O can be called independently by an individual process or any subset of processes of a parallel application. The advantage of independent I/O is that users have the freedom to perform I/O for each individual process or any subset of the processes that open the file.

Complete Chapter List

Search this Book: