Advances in Fault-Tolerant Multi-Agent Systems

Advances in Fault-Tolerant Multi-Agent Systems

Lúcio Sanchez Passos, Rosaldo J. F. Rossetti, Joaquim Gabriel
Copyright: © 2015 |Pages: 12
DOI: 10.4018/978-1-4666-5888-2.ch690
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Chapter Preview

Top

Introduction

Multi-Agent Systems (MAS) arose in the early 1980’s as a promising software paradigm for complex distributed systems. It derives from an Artificial Intelligence sub-field concerned with concurrency of multiple intelligent problem-solvers, known as Distributed Artificial Intelligence (DAI). According to Gasser (1987), MAS “is concerned with coordinated intelligent behavior among a set of (possibly pre-existing) autonomous intelligent ‘agents:’ how they can coordinate their knowledge, goals, skills, and plans jointly to take action or solve (possibly multiple and independent) problems.” Since then, the concept of autonomous agent has been broadly studied in diverse fields.

In spite of the vast number of research achievements in a couple of decades and successful industrial applications such as ARCHON (Wittig, 1992), the body of knowledge on MAS has not yet experienced the growth of practical implementation in real complex distributed systems. Several works (McKean, Shorter, Luck, McBurney, & Willmott, 2008; Pěchouček & Marík, 2008) have pointed to the under-exploration of the multi-agent approach in industrial environments, and thus this issue motivates the foundation of the Technical Forum Group (TFG).

Seeking for enlightenment to the question “Why not Multi-Agent Systems are largely used in real complex (distributed) systems?” the TFG wrote a document (Mcburney & Omicini, 2008) in which they pointed out several issues of the theme. One of them is the risk related to the implementation of this new approach which has never showed its value in large-scale problems. Such negative perspective comes from the industry’s “fear” of the emergent behavior of MAS without any central decision unit (Pěchouček & Marík, 2008). This issue by itself makes the industry to avoid the use of MAS to control their critical tasks. To overcome this issue, it is necessary to expand the MAS capabilities in order to assure that the system is prepared to deal with unexpected situations on a sound and safe basis.

Hence a critical aspect emerges from this discussion: the dependability of software systems. Software dependability is a property that establishes a correlation between the reliance and services delivered by the system; i.e. the overall assurance depends on how an application behaves in the user(s)’ perspective (be it human, hardware, or another software). As noted by (Laprie, 1995), dependability has multiple, but complementary, attributes, namely: confidentiality, integrity, safety, availability, reliability, and maintainability. Confidentiality and integrity are associated with the security of the software and seek to protect it against unauthorized access and modification to information, respectively. The remaining attributes share one point in common: they all seek to ensure reliance.

The means to achieve the desirable level of reliability must be an important point of discussion from the very beginning of a software construction, and then it should progress through the validation phase. According to Pullum (2001), these methods fall into four major groups: fault avoidance, fault removal, fault forecasting, and fault tolerance. As this work focus on runtime faults, fault tolerance techniques must be extensively understood because they aim to ensure complying services in the presence of system faults, i.e. when a fault occurs. Those schemes provide mechanisms to avoid overall failure after the system deployment. Regarding the agents’ context, such mechanisms also have been applied to improve their reliability aiming to achieve a Fault-Tolerant MAS.

Key Terms in this Chapter

Multi-Agent System (MAS): A software system composed of various agents, which can autonomously reason, interact with each other, and act upon a certain environment aiming to fulfil individual and/or collective goals.

Error Recovery: This phase conducts the system to an error-free state because, unless error is removed, the incorrect state might cause a new system failure.

Threat: There are three types of threats ( Avižienis et al., 2004 ): a failure is an event that occurs when the system (or service) behavior deviates from its specification; an error is a system state which is liable to lead to a failure; and the underlying cause of an error is a fault .

Fault Diagnosis: The diagnosis process is triggered by a detected deviation from the expected behavior of a given system, whose aim is to identify the origin of the failure, i.e. to locate all contributors for the unexpected event.

Fault Tolerance: A system is fault tolerant if its behavior is consistent with its specifications, despite whether any component presents a failure.

Agent: A software entity that comforts five essential properties: reactiveness (responds timely to events), autonomy (exerts control over its own actions), goal-orientedness (aims to realize a specific task), persistency (is a continuously running process), sociability (interacts with other entities), intelligence (reasons (in different ways) on how to solve a problem).

Software Reliability: The probability of certain software to maintain the normal operation for a determined time t .

Complete Chapter List

Search this Book:
Reset