Conclusion

Conclusion

Vincenzo De Florio (PATS Research Group, University of Antwerp and iMinds, Belgium)
Copyright: © 2009 |Pages: 24
DOI: 10.4018/978-1-60566-182-7.ch011
OnDemand PDF Download:
$37.50

Abstract

We have reached the end of our discussion about application-level fault-tolerance protocols, which were defined as the methods, architectures, and tools that allow the expression of fault-tolerance in the application software of our computers. Several “messages” have been given: • First of all, fault-tolerance is a “pervasive” concern, spanning the whole of the system layers. Neglecting one layer, for instance the application, means leaving a backdoor open for problems. • Next, fault-tolerance is not abstract: It is a function of the target platform, the target environment, and the target quality of service. The tools to deal with this are the system model and the fault model, plus the awareness that (1) all assumptions have coverage and (2) coverage means that, sooner or later, maybe quite later but “as sure as eggs is eggs,” cases will show up where each coverage assumptions will fail. • This means that there is a (even ethical) need to design our systems thinking of the consequences of coverage failures at mission time, especially considering safety critical missions. I coined a word for those supposed fault-tolerant software engineers that do not take this need into account: Endangeneers. Three well-known accidents have been presented and interpreted in view of coverage failures in the fault and system models. • Next, the critical role of the system structure for the expression of fault-tolerance in computer applications was put forth: From this stemmed the three properties characterizing any application-level fault-tolerance protocol: Separation of concerns, adequacy to host different solutions, and support for adaptability. Those properties address the following question: Given a certain fault-tolerance provision, is it able to guarantee an adequate separation of the functional and non-functional design concerns? Does it tolerate a fixed set of faulty scenarios, or does it dynamically change that set? And, is it flexible enough as to host a large number of different strategies? • Then it has been shown that there exist a large number of techniques and hence of system structures able to enhance the fault-tolerance of the application. Each of these techniques has its pros and cons, which we tried to point out as best as we could. We also attempted to qualify each technique with respect to the above mentioned properties1. A summary of the results of this process is depicted in Fig. 1. • Another key message is that complexity is a threat to dependability, and we must make sure that the extra complexity to manage fault-tolerance does not become another source of potential failures. In other words, simplicity must be a key ingredient of our fault-tolerance protocols, and a faulty fault-tolerant software may produce the same consequence of a faulty non fault-tolerant software—or maybe direr. • Finally, we showed with some examples that adaptive behavior is the only way to match the ever mutating and unstable environments characterizing mobile systems. As an example, static designs would make bad use of the available redundancy.
Chapter Preview
Top

An Introduction And Some Conclusions

We have reached the end of our discussion about application-level fault-tolerance protocols, which were defined as the methods, architectures, and tools that allow the expression of fault-tolerance in the application software of our computers. Several “messages” have been given:

  • First of all, fault-tolerance is a “pervasive” concern, spanning the whole of the system layers. Neglecting one layer, for instance the application, means leaving a backdoor open for problems.

  • Next, fault-tolerance is not abstract: It is a function of the target platform, the target environment, and the target quality of service. The tools to deal with this are the system model and the fault model, plus the awareness that (1) all assumptions have coverage and (2) coverage means that, sooner or later, maybe quite later but “as sure as eggs is eggs,” cases will show up where each coverage assumptions will fail.

  • This means that there is a (even ethical) need to design our systems thinking of the consequences of coverage failures at mission time, especially considering safety critical missions. I coined a word for those supposed fault-tolerant software engineers that do not take this need into account: Endangeneers. Three well-known accidents have been presented and interpreted in view of coverage failures in the fault and system models.

  • Next, the critical role of the system structure for the expression of fault-tolerance in computer applications was put forth: From this stemmed the three properties characterizing any application-level fault-tolerance protocol: Separation of concerns, adequacy to host different solutions, and support for adaptability. Those properties address the following question: Given a certain fault-tolerance provision, is it able to guarantee an adequate separation of the functional and non-functional design concerns? Does it tolerate a fixed set of faulty scenarios, or does it dynamically change that set? And, is it flexible enough as to host a large number of different strategies?

  • Then it has been shown that there exist a large number of techniques and hence of system structures able to enhance the fault-tolerance of the application. Each of these techniques has its pros and cons, which we tried to point out as best as we could. We also attempted to qualify each technique with respect to the above mentioned properties1. A summary of the results of this process is depicted in Figure 1.

  • Another key message is that complexity is a threat to dependability, and we must make sure that the extra complexity to manage fault-tolerance does not become another source of potential failures. In other words, simplicity must be a key ingredient of our fault-tolerance protocols, and a faulty fault-tolerant software may produce the same consequence of a faulty non fault-tolerant software—or maybe direr.

  • Finally, we showed with some examples that adaptive behavior is the only way to match the ever mutating and unstable environments characterizing mobile systems. As an example, static designs would make bad use of the available redundancy.

    Figure 1.

    Application-level software fault-tolerance protocols according to the three structural attributes SC, SA, and A

Complete Chapter List

Search this Book:
Reset
Table of Contents
Acknowledgment
Chapter 1
Vincenzo De Florio
The general objective of this chapter is to introduce the basic concepts and terminology of the domain of dependability. Concepts such as... Sample PDF
Dependability and Fault-Tolerance: Basic Concepts and Terminology
$37.50
Chapter 2
Vincenzo De Florio
After having described the main characteristics of dependability and fault-tolerance, it is analyzed here in more detail what it means that a... Sample PDF
Fault-Tolerant Software: Basic Concepts and Terminology
$37.50
Chapter 3
Vincenzo De Florio
This chapter discusses two large classes of fault-tolerance protocols: • Single-version protocols, that is, methods that use a non-distributed... Sample PDF
Fault-Tolerant Protocols Using Single- and Multiple-Version Software Fault-Tolerance
$37.50
Chapter 4
Vincenzo De Florio
In this chapter our survey of methods and structures for application-level fault-tolerance continues, getting closer to the programming language... Sample PDF
Fault-Tolerant Protocols Using Compilers and Translators
$37.50
Chapter 5
Vincenzo De Florio
The programming language itself is the focus of this chapter: Fault-tolerance is not embedded in the program (as it is the case e.g. for... Sample PDF
Fault-Tolerant Protocols Using Fault-Tolerance Programming Languages
$37.50
Chapter 6
Vincenzo De Florio
After having discussed the general approach of fault-tolerance languages and their main features, the focus is now set on one particular case: The... Sample PDF
The Recovery Language Approach
$37.50
Chapter 7
Vincenzo De Florio
This chapter resumes our survey of application-level fault-tolerance protocols considering approaches based on aspect-oriented programming.... Sample PDF
Fault-Tolerant Protocols Using Aspect Orientation
$37.50
Chapter 8
Vincenzo De Florio
Failure detection is a fundamental building block to develop fault-tolerant distributed systems. Accurate failure detection in asynchronous systems... Sample PDF
Failure Detection Protocols in the Application Layer
$37.50
Chapter 9
Hybrid Approaches  (pages 275-300)
Vincenzo De Florio
This chapter describes some hybrid approaches for application-level software fault-tolerance. All the approaches reported in the rest of this... Sample PDF
Hybrid Approaches
$37.50
Chapter 10
Vincenzo De Florio
As mentioned in Chapter I, a service’s dependability must be justified in a quantitative way and proved through extensive on-field testing and fault... Sample PDF
Measuring and Assessing Tools
$37.50
Chapter 11
Conclusion  (pages 326-349)
Vincenzo De Florio
We have reached the end of our discussion about application-level fault-tolerance protocols, which were defined as the methods, architectures, and... Sample PDF
Conclusion
$37.50
About the Author