Timely Autonomic Adaptation of Publish/Subscribe Middleware in Dynamic Environments

Timely Autonomic Adaptation of Publish/Subscribe Middleware in Dynamic Environments

Joe Hoffert, Aniruddha Gokhale, Douglas C. Schmidt
DOI: 10.4018/jaras.2011100101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Quality-of-service enabled publish/subscribe (pub/sub) middleware provides powerful support for scalable data dissemination. It is difficult to maintain key quality of service properties (such as reliability and latency) in dynamic environments for distributed real-time and embedded systems (such as disaster relief operations or power grids). Managing quality of service manually is often not feasible in dynamic environments due to slow response times, the complexity of managing multiple interrelated quality of service settings, and the scale of the systems being managed. For certain domains, distributed real-time and embedded systems must be able to reflect on the conditions of their environment and adapt accordingly in a bounded amount of time. This paper describes an architecture of quality of service-enabled middleware and corresponding algorithms to support specified quality of service in dynamic environments.
Article Preview
Top

Introduction

The use of publish/subscribe (pub/sub) technologies for distributed real-time and embedded (DRE) systems has grown in recent years due to the advantages of performance, cost, and scale as compared to single computers (Huang, 2006; Tarkoma, 2006). In particular, pub/sub middleware has been leveraged to ease the complexities of data dissemination for DRE systems. Examples of pub/sub middleware include the CORBA Notification Service (Ramani, 2001), the Java Message Service (JMS) (Monson-Haefel, 2000), Web Services Brokered Notification (Niblett, 2005), and the Data Distribution Service (DDS) (Pardo-Castellote, 2003). These technologies support the propagation of data and events throughout a system using an anonymous publication and subscription model that decouples event suppliers and consumers.

Pub/sub middleware is used across a wide variety of application domains, ranging from shipboard computing environments to cloud computing to stock trading. Moreover, the middleware provides policies that affect the end-to-end quality of service (QoS) of applications running in DRE systems. Policies that are common across various middleware technologies include grouped data transfer (i.e., transmitting a group of data atomically), durability (i.e., saving data for subsequent subscribers), and persistence (i.e., saving data for current subscribers).

Even though tunable policies provide fine-grained control of system QoS, several challenges emerge when developing pub/sub systems deployed in dynamic environments. Middleware mechanisms used to ensure certain QoS properties for one environment configuration may be ineffective for different configurations. For example, a simple unicast protocol, such as the User Datagram Protocol (UDP), may address the specified latency QoS when a publisher sends to a small number of subscribers. UDP could incur too much latency, however, when used for a large number of subscribers due to its point-to-point property, leaving the publisher to manage the sending of data to each subscriber.

Challenges also arise when considering multiple QoS policies that interact with each other. For example, a system might need low latency QoS and high reliability QoS, which can affect latency due to data loss discovery and recovery. Certain transport protocols, such as UDP, provide low overhead but no end-to-end reliability. Other protocols, such as the Transmission Control Protocol (TCP), provide reliability but unbounded latencies due to acknowledgment-based retransmissions. Still other protocols, such as lateral error correction protocols (Balakrishnan, 2005), manage the potentially conflicting QoS properties of reliability and low latency, but only provide benefits over other protocols in specific environment configurations.

It is hard to determine when to switch from one transport protocol to another or modify parameters of a particular transport protocol so that desired QoS is maintained. Moreover, manual intervention is often not responsive enough for the timeliness requirements of the system. DRE systems operate within strict timing requirements that must be met for the systems to function appropriately. The problem of timely response is exacerbated as the scale of the system grows, e.g., as the number of publishers or subscribers increases.

This article describes how our work (1) monitors environment changes that affect QoS, (2) determines in a timely manner which appropriate transport protocol changes are needed in response to environment changes, (3) integrates the use of multiple supervised machine learning techniques to increase accuracy, and (4) autonomically adapts the network protocols used to support the desired QoS. We have prototyped this approach in the ADAptive Middleware And Network Transports (ADAMANT) platform (as briefly outlined previously (Hoffert & Schmidt, 2009) that supports environment monitoring and provides timely autonomic adaptation of the middleware. ADAMANT provides the following contributions to research on autonomic configuration of pub/sub middleware in dynamic environments:

  • Leveraging anonymous publish and subscribe middleware based on the DDS specification. DDS defines topic-based high-performance pub/sub middleware to support DRE systems. ADAMANT leverages the middleware to provide environment monitoring information that is disseminated throughout the DRE system (e.g., change in sending rate, change in network percentage loss) to provide updates occurring in the operating environment.

  • Multiple supervised machine learning (SML) techniques as a knowledge base to provide fast and predictable adaptation guidance in dynamic environments. ADAMANT provides timely integrated machine learning (TIML), a novel approach to provide high accuracy and timely determination of which SML technique to use for a given operating environment.

  • Configuration of DRE pub/sub middleware based on guidance from supervised machine learning. Our ADAMANT middleware uses the adaptive network transports (ANT) framework (Hoffert, Gokhale, & Schmidt, 2009) to select the transport protocol(s) that best addresses multiple QoS concerns for given computing resources. ANT provides an infrastructure for composing and configuring transport protocols using modules that provide base functionality (e.g., an IP multicast module that handles multicasting the data to the network). Supported protocols include Ricochet, which uses a variation of forward error correction called lateral error correction that exchanges error correction information among receivers (Balakrishnan, 2007), and NAKcast, which uses negative acknowledgments (NAKs) to provide reliability. These protocols enable trade-offs between latency and reliability to support middleware for enterprise DRE pub/sub systems.

This paper extends our prior work (Hoffert et al., 2010a, 2010b) on ADAMANT by exploring the architecture and control flow for autonomic adaptation of QoS-enabled pub/sub DRE systems. Moreover, this paper (1) empirically evaluates TIML as an approach to increase accuracy and maintain constant-time complexity, (2) leverages the DDS pub/sub infrastructure to disseminate environment changes to all data senders and receivers, (3) empirically evaluates the bounded response times of the ANT framework when adapting transport protocols, and (4) provides an autonomic controller that manages the adaptation of transport protocols to support QoS as the environment changes and details how the controller manages the adaptations. The paper is organized as follows. We start by describing a motivating example and highlighting the challenges. Next, we present the structure and functionality of the ADAMANT framework. We then detail and analyze our experimental results. We compare our work with related research in autonomic adaptation. Finally, we conclude with lessons learned.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 1 Issue (2016)
Volume 6: 2 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing