Optimal Fault Tolerance Strategy Selection for Web Services

Optimal Fault Tolerance Strategy Selection for Web Services

Zibin Zheng (The Chinese University of Hong Kong, China) and Michael R. Lyu (The Chinese University of Hong Kong, China)
DOI: 10.4018/978-1-4666-1942-5.ch010
OnDemand PDF Download:
No Current Special Offers


Service-oriented systems are usually composed by heterogeneous Web services, which are distributed across the Internet and provided by organizations. Building highly reliable service-oriented systems is a challenge due to the highly dynamic nature of Web services. In this paper, the authors apply software fault tolerance techniques for Web services, where the component failures are handled by fault tolerance strategies. In this paper, a distributed fault tolerance strategy evaluation and selection framework is proposed based on versatile fault tolerance techniques. The authors provide a systematic comparison of various fault tolerance strategies by theoretical formulas, as well as real-world experiments. This paper also presents the optimal fault tolerance strategy selection algorithm, which employs both the QoS performance of Web services and the requirements of service users for selecting optimal fault tolerance strategy. A prototype is implemented and real-world experiments are conducted to illustrate the advantages of the evaluation framework. In these experiments, users from six different locations perform evaluation of Web services distributed in six countries, where over 1,000,000 test cases are executed in a collaborative manner to demonstrate the effectiveness of this approach.
Chapter Preview

1. Introduction

Web services are self-contained, self-describing, and loosely-coupled computational components designed to support machine-to-machine interaction by programmatic Web method calls, which allow structured data to be exchanged with remote resource. In the environment of service-oriented computing (Zhang et al., 2007), complex service-oriented systems are usually dynamically and automatically composed by distributed Web service components. Since the Web service components are usually provided by different organizations and may easily become unavailable in the unpredictable Internet environment, it is difficult to build highly reliable service-oriented systems employing distributed Web services. However, reliability is a major issue when applying service-oriented systems to critical domains, such as e-commerce and e-government. There is thus an urgent need for practical reliability enhancement techniques for the service-oriented systems.

By tolerating component faults, software fault tolerance is an important approach for building reliable systems and reducing the expensive roll-back operations in the long-running business processes. One approach of software fault tolerance, also known as design diversity, is to employ functionally equivalent yet independently designed program versions for tolerating faults (Lyu, 1995). This used-to-be expensive approach now becomes a viable solution to the fast-growing service-oriented computing arena, since the distributed Web services with overlapping or equivalent functionalities are usually independently developed by different organizations. These alternative Web services can be obtained from the Internet and employed for the construction of diversity-based fault tolerant service-oriented systems. By fault tolerance techniques, long-running business process roll-backs can be reduced since failures of the components can be tolerated by employing alternative candidates (other Web services). Although a number of fault tolerance strategies have been proposed for establishing reliable traditional systems (Lyu, 1995), in the fast-growing field of service computing, systematic and comprehensive studies on software fault tolerance techniques to transactional Web services are still missing.

When applying fault tolerance techniques to the service-oriented systems, several challenges need to be addressed:

  • The commonly-used fault tolerance strategies should be identified and their performance needs to be investigated and compared extensively by theoretical analysis and real-world experiments.

  • Quality-of-service (QoS) values of the Web services are needed for determining the optimal fault tolerance strategy. However, some nonfunctional performance of the Web services (e.g., response-time and failure-rate) is location-dependent and difficult to obtain.

  • Feasible optimal fault tolerance strategy selection approaches are needed since the Internet is highly-dynamic and the performance of Web services are changing frequently. However, the optimal fault tolerance strategy is application dependent subject to the user preference.

In this paper, we present a distributed fault tolerance strategy evaluation and selection framework for Web services, which is designed and implemented as WS-DREAM (Distributed REliability Assessment Mechanism for Web Ser-vice) (Zheng & Lyu, 2008b, a). In WS-DREAM, the QoS performance of Web services can be obtained via user-collaboration and the optimal fault tolerance strategy is determined in such a way to optimize the performance of the service-oriented system with a given set of user requirements. The contributions of the paper are threefold:

  • Identify various commonly-used fault tolerance strategies and design a distributed evaluation framework for Web services.

  • Propose a dynamic optimal fault tolerance strategy selection algorithm, which can be automatically reconfigured at runtime.

  • Implement a working prototype and conduct large-scale real-world experiments. More than 1,000,000 Web service invocations are executed by 6 distributed service users different locations on 8 Web services located in different countries.

Complete Chapter List

Search this Book: