Total System Intervention for System Failure: Methodology and Its Application to ICT Systems

Total System Intervention for System Failure: Methodology and Its Application to ICT Systems

Takafumi Nakamura (Fujitsu FSAS Inc., Japan) and Kyoich Kijima (Tokyo Institute of Technology, Japan)
Copyright: © 2013 |Pages: 21
DOI: 10.4018/978-1-4666-3998-0.ch014
OnDemand PDF Download:


In this paper, total system intervention for system failure (TSI for SF) is proposed for preventing further occurrences of system failures. TSI is a critical system practice for managing complex and differing viewpoints. First, the authors introduce meta-methodology called “system of system failures” (SOSF) as a common language among various stakeholders to improve their understanding of system failures. The actual application scenario is proposed: “TSI for SF.” The SOSF and related methodologies are used in the course of the subsequent discussion and debate to agree on who is responsible for the failure and identify the preventative measures to be applied. An application example in information and communication technologies engineering demonstrates that using the proposed “TSI for SF” helps prevent future system failures by learning from previous system failures. Three actions are identified for preventing further system failures: closing the gap between the stakeholders, introducing absolute goals, and enlarging system boundary.
Chapter Preview

1. Introduction

There are many examples of similar system failures repeating and of negative side effects created by quick fixes. Introducing safety redundant mechanisms does little to reduce human errors. As pointed out by Perrow (1999, p. 260), the more redundancy is used to promote safety, the greater the chance of spurious actuation; “redundancy is not always the correct design option to use.” While instrumentation is being improved to enable operators to run their operations more efficiently and certainly with greater ease, the risk would seem to remain about the same.

Weick and Sutcliffe (2001, p. 81) explained why traditional total quality management (TQM) has failed. “We interpret efforts by organizations to embrace the quality movement as the beginning of a broader interest in reliability and mindfulness. But some research shows that quality programs have led to only modest gains...this might be the result of incomplete adoption. But we would go even further, and argue that the reason for incomplete adoption is the necessary infrastructure for reliable practice…is not in place even where TQM success stories are the rule. The conclusion is consistent with W.E. Deming’s insistence that quality comes from broad-based organizational vigilance for problems other than those found through standard statistical control methods.”

There are six stages from initial stage to cultural readjustment through catastrophic disasters (Turner & Pidgeon, 1997, p. 88). They are Stage I: Initial beliefs and norms, Stage II: Incubation period, Stage III: Precipitating event, Stage IV: Onset, Stage V: Rescue and salvage and Stage VI: Full cultural readjustment. The second stage, or incubation period, is hard to identify due to the various side effects of quick fixes (Turner & Pidgeon, 1997). Therefore the second stage is playing the crucial role to lead catastrophic disaster. Many side effects due to quick fixes of information and communication technologies (ICT) systems have been identified (Nakamura & Kijima, 2009a). There are two factors in particular that make it difficult to prevent ICT system failures: the lack of a common language for understanding system failures and the lack of a methodology for preventing future system failures. These shortcomings result in local optimization and the introduction of quick fixes as countermeasures. Habermas (1970, 1975, 1984) argued that there are two fundamental conditions underpinning the sociological life of human beings: ‘work: technical interest’ and ‘interaction: practical interest’. Disagreements between individuals and groups are just as much a threat to the socio-cultural form of life as a failure to predict and control. The core idea of intervention methodologies is to accommodate multiple stakeholders and to identify the best methodology for restoring a failed system.

We propose using the “system of system failures” (SOSF) meta-methodology to provide a common language for understanding system failures among the various stakeholders. We also propose using “total system intervention for system failure” (TSI for SF) as a methodology for preventing future system failures of the same type. The SOSF meta-methodology and a stakeholder matrix are used within the TSI for SF methodology. Application examples of ICT systems were used to demonstrate that the TSI for SF methodology is effective.

Complete Chapter List

Search this Book: