Article Preview
Top1. Introduction
Evolving business and customer requirements for ubiquitous computing, immediate access, and more data and data analytics have created new demands for instantaneous “always-available” information. This affects expectations for data centers, both during their regular operations and also when disasters and other disabling events strike. Disabling events include “logical intrusions”, such as when hackers steal information, lock up systems, or initiate denial of service attacks, and physical events, such as tornadoes, hurricanes, winter storms, fires, tsunamis, earthquakes, and power outages.
Many recent high-profile incidents illustrate our exposure to natural disasters. For example, Miller et al. (2006) report how our growing reliance on computing and telecommunications technologies, exacerbate this vulnerability. Because computing and telecommunication technologies depend on data centers, data center risk is a particular concern. Examples of disasters affecting data centers include solar storms (Lloyds of London, 2013); hurricanes (Hardy and Wortham, 2012); earthquakes (Maerowitz, 2017); electrical surges (Gorman, 2013); and fires (Jones, 2012). To deal with these threats and ensuing disaster-related events, Engemann et al. (2005) provide a methodology for disaster management in information technology (IT). This methodology incorporates the relationship among threats, events, control alternatives and losses.
In this paper, we restrict our scope to physical exogenous events that affect data centers, such as those mentioned above. We will not focus on logical intrusions, and will mention information access and privacy only in passing. This is not to downplay their importance but only to sharpen the focus of our analysis.
The managerial approach to facilitating data center resilience in the face of natural events is to develop a Disaster Recovery Plan (DRP). This enables Information Technology (IT) to maintain or restore the systems and communication capabilities of the organization. Disaster recovery planning’s traditional focus was to ensure that IT was resilient (for more on data center resiliency see Jayashankar, 2014; Mohamed, 2011; and Tam, 2011). A resilient system has a “bounce back” capability, when faced with a systemic shock, such as a natural disaster. Since providing services is systemic – depending on computer processing power but also on telecommunications, people, and other services – expanding the scope and implementing processes to include all critical areas of an organization is needed. This led to the expanded field of Business Continuity Management (BCM), which is a holistic management program that identifies potential events that threaten an organization, and provides a framework for building resilience (Engemann and Henderson 2012; Moore and Bone 2017; and Aronis and Stratopoulos 2016). BCM includes the processes and procedures that an organization must put in place to ensure that its mission-critical functions continue during and after crisis events. Because organizations depend on each other and coordinate with supply chain partners, when crisis events occur stakeholders and regulators also need to ensure that proper business continuity plans are in place. Satisfying these requirements means that an effective BCM process enables business function performance, both in the near and long terms.