Multi-Layer Network Performance and Reliability Analysis

Multi-Layer Network Performance and Reliability Analysis

Kostas N. Oikonomou (AT&T Labs Research, USA), Rakesh K. Sinha (AT&T Labs Research, USA) and Robert D. Doverspike (AT&T Labs Research, USA)
DOI: 10.4018/978-1-60960-505-6.ch008


The authors describe a methodology for evaluating the performability (combined performance and reliability) of large communications networks. Networks are represented by a 4-level hierarchical model, consisting of traffic matrix, network graph, “components” representing failure modes, and reliability information. Network states are assignments of modes to the network components, which usually represent network elements and their key modules, although they can be more abstract. The components can be binary or multi-modal, and each of their failure modes may change a set of attributes of the graph (e.g. the capacity or cost of a link). Their methodology also captures the effect of automatic restoration against network failures by including two common rerouting methods. To compute network performability measures, including upper and lower bounds on their cumulative distribution functions, we augment existing probabilistic state-space generation algorithms with our new “hybrid” algorithm. To characterize the network failures of highest impact, we compute the Pareto boundaries of the network’s risk space. The authors have developed a network analysis tool called nperf that embodies this methodology. To illustrate the methodology and the practicality of the tool, they describe a performability analysis of three design alternatives for a large commercial IP backbone network. [Article copies are available for purchase from]
Chapter Preview


Network service providers usually guarantee a certain level of service to their enterprise customers by specifying limits on network measurements such as down time, restoration delay, and packet delay. An example of such a guarantee, also called a “service-level agreement” (SLA), is “with 99.99% probability, at most 5% of the network traffic will be unavailable”. Violation of these agreements results in a penalty to the service providers. In turn, the enterprise customers incur business losses if their networks do not perform as expected1. Some aspects of the service level involve only reliability, such as downtime, while others are pure performance measures, such as packet delay. Each can be evaluated or predicted in isolation by an appropriate model and technique. However, as networks become larger and more complex, it is less realistic to consider only pure reliability measures or to evaluate performance measures as if the network were always in its perfect (no-failure) state2.

Even though the complexity of real networks often requires that performance and reliability evaluations be treated separately from each other, a combined, simultaneous evaluation, known as performability analysis, is more useful (Meyer, 1995). Performability analysis characterizes failures probabilistically and evaluates a performance measure (or measures) over a large set of network states, one of which is the no-failure state and the rest represent combinations of failures. Then one can estimate the expected value of a measure or estimate the probability that the measure does not exceed a threshold. These results often expose the inadequacy of separate performance and reliability analyses, but the simultaneous analysis is much more difficult. For example, if the effects of restoration after network failure are included, performability computation becomes significantly more complex. See Colbourn (1999) for some more on network reliability vs. performability evaluation.

Network performability, as well as stand-alone reliability or performance models, fall into two basic categories. One approach is to consider the effect of equipment failure and network restoration on the performance of a fixed “reference” connection, chosen to represent either the typical or the worst case. This allows very detailed and precise results to be obtained (Oikonomou, 2006), but ignores the interaction among connections in a network. The other approach, which we employ here, is a performability analysis of the entire network. There are two main points to note. First, the complexity of any realistic network necessitates a description in various layers and a failure mapping between these layers. Second, because the size of the state space is exponential in the number of failure modes, exploring all possible failure states is impractical. An approach proven successful in practice is to generate failures in order of probability, i.e. most likely first, and evaluate the performance measures on these states as they are generated.

Complete Chapter List

Search this Book: