A Perspective on the Standardization of Autonomic Detection of Service Level Agreement Violations

A Perspective on the Standardization of Autonomic Detection of Service Level Agreement Violations

Jéferson Campos Nobre (University of Vale do Rio dos Sinos, Brazil) and Lisandro Zambenedetti Granville (Federal University of Rio Grande do Sul, Brazil)
Copyright: © 2019 |Pages: 17
DOI: 10.4018/978-1-5225-7146-9.ch011

Abstract

Service level agreements (SLAs) allow networked services established between providers and their customers to operate according to the conditions defined in the SLA. Measurement mechanisms can be used to support SLA monitoring. However, these mechanisms are expensive in terms of resource consumption. In addition, if the number of SLA violations at any given time is greater than the available measurement sessions, some violations will likely be missed. The current best practice is to observe just a subset of network destinations based upon the expertise of a few human administrators. Such observation mode is error prone, reactive, and scales poorly. Such practice can lead to SLA violations being missed, which hampers the reliability of the SLA monitoring process. In this context, the use of autonomic network features can improve such processes, especially when these features are deployed in a decentralized manner. The use of these autonomic features is described in RFC 8316. The authors expect that such a document can lead to better SLA monitoring tools and methods.
Chapter Preview
Top

Introduction

Communication requirements of distributed services running on top of an IP-based infrastructure have become increasingly demanding. Some examples are HealthCare applications (eHealth) and data-intensive science applications (eScience). The provisioning of such services with the adequate level of quality, as typically documented in the Service Level Specification (SLS) that pertains to the Service Level Agreement (SLA), is conditioned by the accommodation of requirements that are usually expressed in terms of metrics, such as inter-packet delay variation, packet loss or latency. Such requirements usually lead to the definition of Service Level Objectives (SLOs) that must be met. Those SLOs are part of SLAs that define a contract between the provider and the consumer of the service.

Performance requirements can be employed effectively by both service providers and customers. In this context, SLOs reflect a service-level guarantee that the consumer of the service can expect to receive. Likewise, the provider of a service needs to ensure that the service-level guarantee and associated SLOs are met. When such SLOs are not met, SLAs usually include financial or other penalties, possibly with the risk of cancelling the deal. Besides, an adequate support of SLAs also improves the commercial reputation of the service provider, considering prospective customers.

The detection of SLA violations is based on the idea of identifying deviations from the contracted SLOs. In order to identify these deviations by using active measurements, it is necessary to have measurement sessions activated on key end-to-end network destinations. However, such activation is expensive in terms of resource consumption (both human and computational), let alone the amount of monitoring traffic that may jeopardize the performance of network devices and network bandwidth. Since a better monitoring coverage requires more active sessions, it increases the amount of consumed resources. On the other hand, enabling the observation of just a subset of all network flows decreases the resource consumption, but it can lead to insufficient coverage.

The decision about how to place measurement sessions is an important management issue, since it impacts the SLA monitoring coverage. The goal is to obtain the maximum coverage with a limited amount of measurement overhead. Specifically, the goal is to maximize the number of SLA violations that are detected with a limited number of resources. In this context, a feasible approach would be to add up the service levels observed across different path segments. This allows the decomposition of a large set of end-to-end measurements into a much smaller set of segment measurements. However, some end-to-end service levels cannot be determined by an additive approach. Some examples of metrics that are inadequate for additive approaches are end-to-end jitter and mean opinion scores, thus they must be measured end-to-end (Nobre, Granville, Clemm, & Prieto, 2018a).

Often, the current best practice for activating measurement sessions within a provider’s network consists in relying on the network administrator’s expertise to determine which destinations to select to activate the corresponding monitoring sessions. This practice has major shortcomings. Indeed, such practice assumes high dynamics and increases the complexity of network environments and delivered services. In order to provide solutions that better suit such dynamics and complexity, network-wide management solutions can be employed. A network-wide control of network devices can improve their abilities to accomplish management tasks. For example, a distributed network management algorithm can be used to allow that some devices provide additional resources for the execution of management tasks by other devices. This can be useful when either the computational load is not equally distributed among the network devices or there is heterogeneity in the computational resources of network devices. In this context, the global capability of the devices in a network can be greater than the sum of the capabilities of each device.

Key Terms in this Chapter

Measurement Session: A communications association between a probe and a responder used to send and reflect synthetic test traffic for active measurements.

Responder: The destination for synthetic test traffic in an active measurement.

Autonomic Service Agent (ASA): An agent implemented on an autonomic node that implements an autonomic function, either in part (in the case of a distributed function, as in the context of this chapter) or whole.

Passive Measurements: Techniques used to measure service levels based on observation of production traffic.

SLA: Service level agreement.

Autonomic Network: A network containing exclusively autonomic nodes, requiring no configuration, and deriving all required information through self-knowledge, discovery, or intent.

Active Measurements: Techniques to measure service levels that involve generating and observing synthetic test traffic.

P2P: Peer-to-peer.

Probe: The source of synthetic test traffic in an active measurement.

SLO: Service level objective.

Complete Chapter List

Search this Book:
Reset