Distributed Indexing Networks for Efficient Large-Scale Group Communication

Distributed Indexing Networks for Efficient Large-Scale Group Communication

George V. Popescu (University “Politehnica” Bucharest, Romania)
DOI: 10.4018/978-1-61520-686-5.ch015

Abstract

Recently a new category of communication network paradigms has emerged: overlay networks for content distribution and group communication, application level multicast and distributed hash tables for efficient indexing and look-up of network resources, etc. As these ideas mature, new Internet architectures emerge. The authors describe here an autonomic, self-optimizing, network virtualization middleware architecture designed for large scale distributed applications. The proposed architecture uses end-hosts and proxies at the edge of the network as the forwarding nodes for distributing content to multiple receivers using simple point-to-point communication. Routing nodes have the capability to process the content prior to forwarding to meet the heterogeneous requirements of receivers. The proposed architecture builds upon a new network abstraction. Distributed indexing networks (DIN) is a new paradigm of communication networks design that relies on assigning indices to communication entities, communication infrastructure nodes and distributed infrastructure resources to control and disseminate information. DINs are in essence overlay networks whose topology is defined by a set of connectivity rules on indices assigned to network nodes. DINs route data packets using network indices (identifiers) and descriptors contained in the application level routing header; messages are routed hop by hop by querying at each node an application level routing indexing structure. As an application of DINs, the authors present an index-based routing multicast protocol together with its distribution tree optimization algorithm. To support applications involving large dynamic multicast groups, the application level multicast scheme uses hierarchical group membership aggregation and stateless forwarding within clusters of network nodes. The authors define the information space (IS) as the multidimensional space that indexes all information available in the network. The information includes infrastructure information (network nodes addresses, storage nodes location), network measurements data, distributed content descriptors, communication group identifiers, real-time published streams and other application dependent communication semantics, etc. The entity communication interest (ECI) is the vector describing the time-dependent information preferences of a network entity (multicast group client, user, etc.). Communication control architecture partitions the IS into interest cells mapped to multicast communication groups. The proposed control algorithm uses proximity-based clustering of network nodes and hierarchical communication interest aggregation to achieve scalability. The authors show that large-scale group communication in the proposed distributed indexing networks requires low computation overhead with a controlled degradation of the end-to-end data path performance.
Chapter Preview
Top

Introduction

Scalability is an important design consideration in the emerging large-scale group communication applications. Many of these applications (i.e. distributed look-up, large scale content distribution, group collaboration, data publishing/subscribing, etc.) require efficient data path control algorithms (El-Sayed et al., 2003), (Taylor et al., 2004). In addition, group communication applications have quality of service constraints that require efficient data routing. This classifies large-scale group communication amongst the most challenging applications distributed on overlay networks. As such, large-scale group communication requires the design of new architectures and data communication algorithms. We introduce in this chapter an abstraction of communication services that can be used to design high performance, self-managing Internet-scale applications. We discuss the analysis of a Distributed Indexing Network and the associated algorithms for group communication. The analysis gives a perspective on the potential of virtualized, service oriented, Internet architectures.

The virtualized architecture we consider here is dependent on the choice of group communication representation. Two models of group management are prevalent: clients are grouped a) statically, by matching their interest to a fixed partitioning of the information space or b) dynamically, by propagating their communication interest in a control hierarchy (Chang et al., 2002; Wong et al., 2000). Here we model the communication interest as a multi-dimensional space dynamically partitioned into optimum size cells mapped to communication groups. The design of scalable data distribution network architectures for large-scale group communication has multiple objectives: 1) grouping participants according to their communication interest, 2) organizing the data path to guarantee end-to-end network latency and 3) reducing the signaling overhead generated by frequent changes in client multicast group membership. Efficient group communication requires minimizing the wasted communication capacity when multicasting messages to groups of clients with similar interest. Various methods of interest-based grouping have been proposed without considering the constraints imposed by the communication infrastructure. Recent work has proposed DHT-based solutions consisting in a decomposition of the information space such that each attribute/dimension is filtered independently; this introduces an overhead that scales linearly with the number of attributes. Others have modeled the group communication interest as a topic-based publish/subscribe relation where the receivers specify their interest in sub-domains (cells) of the information space (Chang et al., 2002). A related area with similar performance issues is dissemination of messages in publish/subscribe systems (Banavar et al., 1999).

In order to support the increasing complexity of Internet applications, including group communication, a more powerful network abstraction is needed. A new paradigm of virtualized networks models all communicating entities (network clients, proxies, storage servers, application level routers) as static and/or dynamic objects labeled with network overlay identifiers. The network virtualization infrastructure consists of several abstraction layers with distinct functions (network measurement, data distribution, communication control) where communication entities may be assigned multiple identifiers in various layers; each layer’s topology is modeled as a graph connecting participating entities indexed in that layer. The communication infrastructure is optimized within the layer by selecting rules for vertex index and edge assignment that minimize/maximize a given communication objective and cross-layers, by mapping optimally graphs from different layers. An example of a design optimization problem is the joint minimization of average communication delays and excess communication bandwidth capacity (the additional bandwidth required to route data packets using application level routers) given the network infrastructure measurements and application/infrastructure constraints.

Complete Chapter List

Search this Book:
Reset