An Automatic Recovery Mechanism for Cloud Service Composition

An Automatic Recovery Mechanism for Cloud Service Composition

Wenrui Li (School of Mathematics & Information Technology, Nanjing Xiaozhuang University, Nanjing, China & State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China), Yan Cheng (College of Computer and Information, Hohai University, Nanjing, China), Pengcheng Zhang (College of Computer and Information, Hohai University, Nanjing, China) and Hareton Leung (Department of Computing, Hong Kong Polytechnic University, Hong Kong, China)
Copyright: © 2016 |Pages: 17
DOI: 10.4018/IJWSR.2016010102
OnDemand PDF Download:
No Current Special Offers


Cloud computing, with characteristics of large scale computation, data storage, visualization, high expansibility and elasticity, provides a powerful computing paradigm. Cloud services can be rapidly composed to form on-demand composite service for accomplishing the users' requirements. However, the uncertainty of cloud services has impacted on the correctness and reliability of the composite services. Especially, for unanticipated hardware and software failures, it is very difficult to assure the quality of the composite services. In the complex cloud computing environments, recovery of the composite services from these failures is a challenging issue. The paper first presents a unified fault taxonomy in the three layers of cloud computing and analyze the causes of the faults. The authors then propose a hierarchical recovery mechanism including five different recovery algorithms for various kinds of failures. Finally, through the simulation experiments they validate the proposed approach to be effective and practical.
Article Preview


Nowadays, cloud computing can effectively reconfigure various resources to offer composite services for meeting the dynamic needs of users. The users not only focus on the functional properties of the composite services, but also on the non-functional properties (Grunske & Zhang, 2009; Zhang, Li, Wan, & Grunske, 2011), such as reliability, availability, security, etc. Due to the complex cloud environment, a variety of uncertainties may affect the quality of cloud services. For example, individual cloud services, that are distributed on the Internet, derived from different organizations, and running on different system platforms, may generate anomalies. Unpredictable faults also result in the composite services unable to run correctly. To ensure that the composite services remain in a consistent state even in the presence of failures is a challenging problem.

Several approaches have been proposed to recover from failures of cloud services. Current work lacks a comprehensive understanding of the causes and effects of faults in the complex cloud computing environment. Some approaches just propose service recovery strategies for specific types of faults in certain layer of cloud computing, not considering the failures in all three layers of cloud computing at the same time (Juhnke, Dornemann, & Freisleben, 2009; Mdhaffar, Halima, Juhnke, Jmaiel &Freisleben, 2011; Nallur & Bahsoon, 2013; Ramakrishnan, Koelbel, Keeet et al., 2009). For example, Mdhaffar et al. (2011) present the recovery of SaaS services from failures by the Aop4csm approach. Juhnke et al. (2009) propose to recover IaaS failures by a policy-based approach. In addition, these recovery approaches do not take into account service granularity. They are only suitable for basic services, but not for composite services. Thus, these approaches offer no comprehensive recovery framework for the failures occurred in different cloud layers.

In this paper, we first identify the causes and effects of faults in cloud computing environment, and analyze the relationship between the faults and failures. A unified fault taxonomy is presented for the three layers of cloud computing, where the faults are related to the infrastructure layer, platform layer and software layer. We then propose a hierarchical recovery framework where a series of recovery strategies are used for these failures. In addition, recovery strategies depend on the different service granularity such as basic services and composite services. Four recovery strategies for basic services, according to various fault causes, are undo, redo, substitute with undo and substitute without undo. The recovery strategy for composite services is recompose.

The contributions of the paper are summarized as follows:

  • The relationship between the faults and failures is analyzed. The taxonomy of faults in three layers of cloud computing is presented;

  • According to different service granularity, five recovery strategies are proposed for basic services and composite services, respectively;

  • A simulation system for failure recovery of cloud service composition named CSFRS (Cloud Service Failure Recovery System) is developed. Experimental results based on the simulation system are performed to validate our proposed recovery algorithms.

Complete Article List

Search this Journal:
Open Access Articles
Volume 19: 4 Issues (2022): Forthcoming, Available for Pre-Order
Volume 18: 4 Issues (2021): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2020)
Volume 16: 4 Issues (2019)
Volume 15: 4 Issues (2018)
Volume 14: 4 Issues (2017)
Volume 13: 4 Issues (2016)
Volume 12: 4 Issues (2015)
Volume 11: 4 Issues (2014)
Volume 10: 4 Issues (2013)
Volume 9: 4 Issues (2012)
Volume 8: 4 Issues (2011)
Volume 7: 4 Issues (2010)
Volume 6: 4 Issues (2009)
Volume 5: 4 Issues (2008)
Volume 4: 4 Issues (2007)
Volume 3: 4 Issues (2006)
Volume 2: 4 Issues (2005)
Volume 1: 4 Issues (2004)
View Complete Journal Contents Listing