Fault-Tolerant Protocols Using Single- and Multiple-Version Software Fault-Tolerance

Fault-Tolerant Protocols Using Single- and Multiple-Version Software Fault-Tolerance

Vincenzo De Florio (PATS Research Group, University of Antwerp and iMinds, Belgium)
Copyright: © 2009 |Pages: 80
DOI: 10.4018/978-1-60566-182-7.ch003
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter discusses two large classes of fault-tolerance protocols: • Single-version protocols, that is, methods that use a non-distributed, single task provision, running side-by-side with the functional software, often available in the form of a library and a run-time executive. • Multiple-version protocols, which are methods that use actively a form of redundancy, as explained in what follows. In particular recovery blocks and N-version programming will be discussed. The two families have been grouped together in this chapter because of the several similarities they share.
Chapter Preview
Top

Introduction And Objectives

This chapter discusses two large classes of fault-tolerance protocols:

  • Single-version protocols, that is, methods that use a non-distributed, single task provision, running side-by-side with the functional software, often available in the form of a library and a run-time executive.

  • Multiple-version protocols, which are methods that use actively a form of redundancy, as explained in what follows. In particular recovery blocks and N-version programming will be discussed.

The two families have been grouped together in this chapter because of the several similarities they share.

A key requirement for the development of fault-tolerant systems is the availability of replicated resources, in hardware or software. A fundamental method employed to attain fault-tolerance is multiple computation, i.e., N-fold (N > 1) replications in three domains:

  • Time That is, repetition of computations.

Following Avižienis (Avižienis, 1985), it is possible to characterize at least some of the approaches towards fault-tolerance by means of a notation resembling the one used to classify queuing systems models (Kleinrock, 1975):

nT /mH/pS,

the meaning of which is “n executions, on m hardware channels, of p programs”. The non-fault-tolerant system, or 1T/1H/1S, is called simplex in the cited paper.

  • Space I.e., the adoption of multiple hardware channels (also called “lanes”).

  • Information That is, the adoption of multiple versions of software.

Complete Chapter List

Search this Book:
Reset