Replica Placement Strategy for Data Grid Environment

Replica Placement Strategy for Data Grid Environment

Mohammed K. Madi (Universiti Utara Malaysia, Sintok, Kedah, Malaysia), Yuhanis Yusof (Universiti Utara Malaysia, Sintok, Kedah, Malaysia) and Suhaidi Hassan (Universiti Utara Malaysia, Sintok, Kedah, Malaysia)
Copyright: © 2013 |Pages: 12
DOI: 10.4018/jghpc.2013010105
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Data Grid is an infrastructure that manages huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. To increase resource availability and to ease resource sharing in such environment, there is a need for replication services. Data replication is one of the methods used to improve the performance of data access in distributed systems by replicating multiple copies of data files in the distributed sites. Replica placement mechanism is the process of identifying where to place copies of replicated data files in a Grid system. Existing work identifies the suitable sites based on number of requests and read cost of the required file. Such approaches consume large bandwidth and increases the computational time. The authors propose a replica placement strategy (RPS) that finds the best locations to store replicas based on four criteria, namely, 1) Read Cost, 2) File Transfer Time, 3) Sites’ Workload, and 4) Replication Sites. OptorSim is used to evaluate the performance of this replica placement strategy. The simulation results show that RPS requires less execution time and consumes less network usage compared to existing approaches of Simple Optimizer and LFU (Least Frequently Used).
Article Preview

Introduction

Data Grids (Chervenak, 2003; Foster et al., 2002) is an infrastructure that deals with huge amount of data to enable grid applications to share data files in a coordinated manner. Such an approach is seen to provide fast, reliable and transparent data access. Nevertheless, the approach is considered as a challenging problem in grid environment because the volume of data to be shared is large despite of limited storage space and network bandwidth. Furthermore, resources involved are heterogeneous as they belong to different administrative domains in a distributed environment.

However, it is unfeasible for all users to access a single instance of data (e.g. a data file) from one single organization (e.g. site). This would lead to the increase of data access latency. Furthermore, one single organization may not be able to handle such a huge volume of data by itself. Motivated by these considerations, a common strategy is used in data grids as well as in distributed systems, and is known as replication. Replication vouches the efficient access without large bandwidth consumption and access latency (Chervenak et al., 2001; Chervenak et al., 2002; Guy et al., 2002; Lamehamedi et al., 2003; Otoo et al., 2002; Ranganathan & Foster, 2001b). Replication technique is one of the major factors affecting the performance of data grids (You et al., 2006). Creating replicas can reroute a client requests to certain replica sites and offer a higher access speed (Tang et al., 2005).

Replication is also bounded by two factors: the size of storage available at different sites within the Data Grid and the bandwidth between these sites (Venugopal et al., 2006). Furthermore, the files in Data Grid are mostly large (Rahman et al., 2009); so, replication to every site is infeasible. Therefore deciding on the optimal locations to host a certain popular files is needed, in order to reduce the bandwidth consumption of the network. In this work, we propose a Replica Placement Strategy (RPS) to find the best sites to host the newly created replicas. The proposed model addresses the problems of current replication models which could be epitomized in two points:

  • 1.

    A large amount of network bandwidth is consumed resulting from a bad utilization of the network by the existing systems (Chang, 2006; Rasool et al., 2008; Ruay-Shiung et al, 2008; Shorfuzzaman et al., 2008; Tang et al., 2005; Tang et al., 2006; Wang et al., 2007; Yang et al., 2007) .

  • 2.

    As a result of bad utilization of network bandwidth will lead to increasing of the job execution time (Mansouri et al., 2008; Pangfeng & Jan-Jan, 2006; Ranganathan & Foster, 2001a; Ranganathan et al., 2002; Ruay-Shiung et al., 2008; Yi-Fang et al., 2006).

The proposed work is expected to minimize network bandwidth consumption and reduce job execution time.

There are many studies in the literature that concern replica placements issues. Chin-Min Wan et al. (Wang et al., 2007) proposed a replica placement scheme that tries to overcome the bottleneck caused by increasing the downlinks, which are occurring at the same time. The proposed strategy chooses the best site to host the replica according to the evaluation result based on the number of user request and transmission cost. The purpose of the strategy is to replicate the file to a site that provides minimum average transmission cost. Transmission cost is defined to be inversely proportional to bandwidth, and the site that provides the minimum average transmission cost is selected.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing