Modeling and Evaluating the Effects of Big Data Storage Resource Allocation in Global Scale Cloud Architectures

Modeling and Evaluating the Effects of Big Data Storage Resource Allocation in Global Scale Cloud Architectures

Enrico Barbierato (Seconda Università degli Studi di Napoli, Caserta, Italy), Marco Gribaudo (Politecnico di Milano, Milano, Italy) and Mauro Iacono (Seconda Università degli Studi di Napoli, Caserta, Italy)
Copyright: © 2016 |Pages: 20
DOI: 10.4018/IJDWM.2016040101
OnDemand PDF Download:
$37.50

Abstract

The availability of powerful, worldwide span computing facilities offering application scalability by means of cloud infrastructures perfectly matches the needs for resources that characterize Big Data applications. Elasticity of resources in the cloud enables application providers to achieve results in terms of complexity, performance and availability that were considered beyond affordability, by means of proper resource management techniques and a savvy design of the underlying architecture and of communication facilities. This paper presents an evaluation technique for the combined effects of cloud elasticity and Big Data oriented data management layer on global scale cloud applications, by modeling the behavior of both typical in memory and in storage data management.
Article Preview

Introduction

The planning and management of resources in geographically distributed cloud oriented infrastructures requires a proper understanding of the phenomena that take place on a very complex architecture. Geographical distribution greatly improves availability of data and services and performance, as physical decoupling of sites strengthens resistance to catastrophic events and lowers network congestions. However, in systems that are already composed of many racks (each of which hosts many computing and storage units that support the execution of a non-negligible number of tasks), geographical distribution adds another layer of complexity, affecting the predictability of resource usage.

Resource planning and management aims to design proper policies for workload balancing and optimization and for data replication. The costs of distribution are not only caused by construction and maintenance of data centers and communication infrastructures, but also by duplication and synchronization issues, that are critical activities for cost optimization. Designing proper policies is also crucial when the overall infrastructure has to be reconfigured, e.g. because of a (overall or local) hardware upgrade, an extension (by adding other data centers or significantly expanding existing ones) or a damage (a data center is hit by a natural disaster or by a prolonged energetic outage) occur.

The distribution and the replication of data can also be exploited to provide a performance benefit to users, by a proper dispatching of requests to the nearest node or to the less congested one in terms of unused incoming bandwidth. In these terms, the correct management of space allocation should be tuned according to the possibility of exploitation of the replication to compensate the costs due to transfers and migrations.

Moreover, the increasing diffusion of commercial services and applications based on massively parallel computations on very large data sets, that require a continuously variable amount of allocated resources depending on instantaneous workload and operation schedule (usually defined Big Data applications (Castiglione, Gribaudo, Iacono, & Palmieri, 2014a; Castiglione, Gribaudo, Iacono, & Palmieri, 2014b; Barbierato, Gribaudo, & Iacono, 2013a; Barbierato, Gribaudo, & Iacono, 2013b; Barbierato, Gribaudo, & Iacono, 2014; Cerotti, Gribaudo, Iacono, & Piazzolla, 2014; Cerotti, Gribaudo, Iacono, & Piazzolla, 2015)), further complicates data migration and synchronization policies and requirements.

Coping with such complex scenarios and problems needs a proper support by models and evaluation techniques that can scale up to a very relevant number of configuration in the state space of the system, can compute performance parameters in adequate time and can support a flexible configuration of both architecture and workload. Classical state space based techniques, such as Petri nets variants and evolutions, or simulation based techniques suffer by the number of parameters that must be handled and managed and by the dimensions of the architectural model to setup.

In this paper we propose a modeling approach for the evaluation of the effects of storage allocation policies in geographically distributed global scale cloud architectures, based on a state space based modeling technique, namely Markovian Agents (MA), that uses continuous approximation of Markovian processes, suited for systems with wide state spaces characterized by compositions of replicated elementary behaviors. In particular, we focus on the reliability of the considered systems, and we exploit the properties of the agents to model the on-off behavior of the components. This is a novel extension and constitutes the innovative contribution of this paper. The approach is informally introduced, and applied to the evaluation of a realistic scenario, that encompasses both in memory and in storage based Big Data applications.

The paper is organized as follows: after this introduction, a related works Section is provided, to introduce the reader to both the main themes of global scale cloud architectures and MA; a description of the overall modeling approach and a case study are given in the next two Sections; conclusions and future works conclude the paper.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing