Article Preview
TopIntroduction
The planning and management of resources in geographically distributed cloud oriented infrastructures requires a proper understanding of the phenomena that take place on a very complex architecture. Geographical distribution greatly improves availability of data and services and performance, as physical decoupling of sites strengthens resistance to catastrophic events and lowers network congestions. However, in systems that are already composed of many racks (each of which hosts many computing and storage units that support the execution of a non-negligible number of tasks), geographical distribution adds another layer of complexity, affecting the predictability of resource usage.
Resource planning and management aims to design proper policies for workload balancing and optimization and for data replication. The costs of distribution are not only caused by construction and maintenance of data centers and communication infrastructures, but also by duplication and synchronization issues, that are critical activities for cost optimization. Designing proper policies is also crucial when the overall infrastructure has to be reconfigured, e.g. because of a (overall or local) hardware upgrade, an extension (by adding other data centers or significantly expanding existing ones) or a damage (a data center is hit by a natural disaster or by a prolonged energetic outage) occur.
The distribution and the replication of data can also be exploited to provide a performance benefit to users, by a proper dispatching of requests to the nearest node or to the less congested one in terms of unused incoming bandwidth. In these terms, the correct management of space allocation should be tuned according to the possibility of exploitation of the replication to compensate the costs due to transfers and migrations.
Moreover, the increasing diffusion of commercial services and applications based on massively parallel computations on very large data sets, that require a continuously variable amount of allocated resources depending on instantaneous workload and operation schedule (usually defined Big Data applications (Castiglione, Gribaudo, Iacono, & Palmieri, 2014a; Castiglione, Gribaudo, Iacono, & Palmieri, 2014b; Barbierato, Gribaudo, & Iacono, 2013a; Barbierato, Gribaudo, & Iacono, 2013b; Barbierato, Gribaudo, & Iacono, 2014; Cerotti, Gribaudo, Iacono, & Piazzolla, 2014; Cerotti, Gribaudo, Iacono, & Piazzolla, 2015)), further complicates data migration and synchronization policies and requirements.
Coping with such complex scenarios and problems needs a proper support by models and evaluation techniques that can scale up to a very relevant number of configuration in the state space of the system, can compute performance parameters in adequate time and can support a flexible configuration of both architecture and workload. Classical state space based techniques, such as Petri nets variants and evolutions, or simulation based techniques suffer by the number of parameters that must be handled and managed and by the dimensions of the architectural model to setup.
In this paper we propose a modeling approach for the evaluation of the effects of storage allocation policies in geographically distributed global scale cloud architectures, based on a state space based modeling technique, namely Markovian Agents (MA), that uses continuous approximation of Markovian processes, suited for systems with wide state spaces characterized by compositions of replicated elementary behaviors. In particular, we focus on the reliability of the considered systems, and we exploit the properties of the agents to model the on-off behavior of the components. This is a novel extension and constitutes the innovative contribution of this paper. The approach is informally introduced, and applied to the evaluation of a realistic scenario, that encompasses both in memory and in storage based Big Data applications.
The paper is organized as follows: after this introduction, a related works Section is provided, to introduce the reader to both the main themes of global scale cloud architectures and MA; a description of the overall modeling approach and a case study are given in the next two Sections; conclusions and future works conclude the paper.