A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids

A Two-Level Fuzzy Value-Based Replica Replacement Algorithm in Data Grids

Nazanin Saadat (Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran) and Amir Masoud Rahmani (Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran)
Copyright: © 2016 |Pages: 22
DOI: 10.4018/IJGHPC.2016100105
OnDemand PDF Download:
No Current Special Offers


One of the challenges of data grid is to access widely distributed data fast and efficiently and providing maximum data availability with minimum latency. Data replication is an efficient way used to address this challenge by replicating and storing replicas, making it possible to access similar data in different locations of the data grid and can shorten the time of getting the files. However, as the number and storage size of grid sites is limited and restricted, an optimized and effective replacement algorithm is needed to improve the efficiency of replication. In this paper, the authors propose a novel two-level replacement algorithm which uses Fuzzy Replica Preserving Value Evaluator System (FRPVES) for evaluating the value of each replica. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid projects. Results from simulation procedure show that the authors' proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, total number of replications and effective network usage.
Article Preview

1. Introduction

“Grid” computing has emerged as an important field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance direction (Foster, Kesselman, and Tuecke, 2001). Grid technologies make it possible for scientific collaborations and institutes to share resources on an unprecedented scale and for geographically distributed groups to work together in ways that were previously impossible (Foster, 2002). Job scheduling (Kazem, Rahmani and Aghdam, 2008), (Baghban and Rahmani 2008), (Adabi, Movaghar and Rahmani, 2014), resource discovery (Adabi, Movaghar, Rahmani and Beigy, 2012), (Navimipour, Rahmani, Navin and Hosseinzadeh, 2014), (Navin, Navimipour, Rahmani, and Hosseinzadeh, 2014), power management (Ouyang, Chiang, Hsu and Yi, 2014) and data replications (Yuhanis, 2014), (Grace, Manimegalai, 2014) are the most important challenges in data grid environments.

A data grid which is one of the various types of grids connects a collection of geographically distributed computers and storage resources (Rahmani, Fadaie, and Chronopoulos, 2015) that may be located in different parts of a country or even in different countries, and enables users to share data and other resources. In a distributed computing environment, various and different resources are available, such as: large volume data storage, supercomputers, video equipments, and so on (Wang, Chen, Deng, and Huang, 2011). Among all these resources, data grids (Souri, Rahmani, 2014) primarily deal with providing services and infrastructure for distributed data-intensive applications which often require access a large amount of data (terabytes or petabytes). Examples of such applications include high-energy physics, climate modeling, bioinformatics, brain activity analysis, image processing, earthquake engineering, astronomy and ray tracing (Amoon, 2012). Managing such huge and widely distributed amount of data and complex distributed environments in a centralized location is ineffective and increases the data access time and brings problems such as single point of failure and bottleneck. Also, for most users, it becomes more difficult and sometimes impossible to address all requirements on a single computing platform or for that matter in a single location (Wang, Chen, Deng, and Huang, 2011), (Wu, Lai, and Lai, 2012). Therefore, this huge amount of data should be replicated and distributed in several physical locations of distributed system to avoid such problems. Data grid retrieves data from closest grid site and replicates it for the requester site at the time of need.

As the study on data grid becoming more popular, techniques of replica management become increasing well into research (Yuhanis, 2014). Data replication is an effective method which supports the management of Virtual Organization (VO) storage (and perhaps also network and computing) resources to maximize data access performance with respect to metrics such as response time, reliability, and cost (Foster, Kesselman, and Tuecke, 2001). The major goal of this technique is to reduce the execution cost of a job, which depends not only on the computational resource assignment but also on the location of data files that are required by the job (Abdurrab, Xie, 2010). Replication process involves the creation of identical copies of data files (called replicas) and placing them over various locations of grid. In addition, if data can be kept close to users via replication, data access performance can be improved greatly (Rahman, Barker, and Alhajj, 2008). This can reduce data access latency, response time, bandwidth consumption, and increase data availability, reliability, fault tolerance, system scalability, load balancing, and robustness of grid applications.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 14: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing