Resilient and Timely Event Dissemination in Publish/Subscribe Middleware

Resilient and Timely Event Dissemination in Publish/Subscribe Middleware

Christian Esposito, Domenico Cotroneo
DOI: 10.4018/978-1-4666-0255-7.ch001
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Recently we have witnessed an increasing demand of fault-tolerant communications in publish/subscribe middleware. Although several reliable solutions have been proposed, none of them address the problem of achieving a resilient and timely event dissemination. We investigate how guaranteeing assured message dissemination despite occurrence of network faults without breaking temporal constraints. The contribution of this article is on devising a FEC approach where encoding functionality is placed at the root and on a subset of interior nodes in the multicast tree. Simulations-based experiments demonstrate that the proposed approach allows all the interested subscribers to receive all the published messages and the adopted resiliency mean does not affect the performance of the multicast protocol.
Chapter Preview
Top

Introduction

A very interesting report produced by Carnegie Mellon University’s Software Engineering Institute in June 2006 and titled “Ultra-Large-Scale Systems: The Software Challenge of the Future” (Pollak et al., 2006) describes a huge evolution in the architecture of software systems. From small, monolithic and vertical architectures, which characterized traditional systems, there has been a shift toward large highly modular, autonomous, heterogeneous and integrated systems of systems. A key aspect in this novel approach to architect critical systems is that traditional systems were implemented using dedicated networks and hardware, whereas the novel ones are distributed over best-effort Wide-Area Networks (WANs), such as Internet. In fact, Internet is a very challenging environment where disseminating data since it is exposed to several kind of failures.

An example is offered by the so-called Large scale Complex Critical Infrastructures (LCCIs), i.e., Internet-scale federation of several autonomous and heterogeneous systems that work collaboratively to provide critical facilities. The core of LCCIs is represented by a middleware solution that provides a multicast service to glue together several systems. In fact, the quality of the overall LCCI is directly related to the quality of the adopted middleware solution. Since LCCIs are composed by several geographically-distributed systems, one of the feature that the adopted middleware must exhibit is scalability. Moreover, the adopted middleware has to address two important challenges that LCCIs impose on it. Specifically, due to its critical nature LCCI has to exhibit fault-tolerant capabilities and to respect strict deadlines to complete an operation. So the data dissemination among the systems composing an LCCI needs to be reliable and timely even if several kind of faults may occur.

Middleware that implements a publish/subscribe interaction model (Eugster, Felber, Gerruaoui & Kermarrec, 2003) is an attractive scalable solution for disseminating data in architectures such as LCCIs, due to its strong decoupling properties which enforce the scalability degree achievable by the middleware. Although considerable amount of work on reliable publish/subscribe middleware has been done, allowing reliable and timely event distribution is still an open issue. Specifically, choosing the most suitable approach to achieve resiliency and timeliness in event delivery is not trivial. In fact, multicast protocols have been the focus of an intense research activity in the last decade, and several approaches have been proposed by academia and industry (Obraczka, 1998)-(Hosseini, Ahmed, Shirmohammadi & Georganas, 2007). Such protocols can be classified in two classes depending on where the multicast functionality is performed: transport-level multicast (TLM) (Obraczka, 1998) and application-level multicast (ALM) (Hosseini, Ahmed, Shirmohammadi & Georganas, 2007). TLMs are based on the use of the IP Multicast. Although IP Multicast is an efficient solution, because it minimizes the packets needed to be exchanged to perform a multicast operation, it is not suitable to multicast over Internet for two main drawbacks. On one hand, IP Multicast has not been extensively deployed into the current Internet infrastructure, so its use is limited to few portions of the overall Internet (Diot, Levine, Lyles, Kassan & Balendiefen, 2000). On the other hand, it is well known that IP Multicast based solutions exhibit limitations on how well they handle a reliable communication among a large number of end-points geographically distributed (the so-called problem of Reliability versus Scalability) (Diot, Levine, Lyles, Kassan & Balendiefen, 2000). ALMs deploy multicast capabilities in the applications, or end-systems, instead in the routers, and disseminate messages through an overlay network. In fact, end-systems, which participate to a multicast session, organize themselves into an overlay topology (Hosseini, Ahmed, Shirmohammadi & Georganas, 2007), which is made of edges that correspond to unicast paths between two end-systems in the underlying Internet. Overlay ALMs have been demonstrated to be not affected by the reliability versus scalability problem, so they appear to be a suitable solution for reliable multicast dissemination over Internet (Baccelli, Chaintreau, Liu, Riabov & Sahu, 2004). However, even if the issue of achieving both fault-tolerance and timeliness guarantees in overlay ALMs has been widely researched, it still remains unsolved. In fact, techniques to tolerate network failures tend to slow down delivery time.

Complete Chapter List

Search this Book:
Reset