Article Preview
Top1. Introduction
“Grid” computing has emerged as an important field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance direction (Foster, Kesselman, and Tuecke, 2001). Grid technologies make it possible for scientific collaborations and institutes to share resources on an unprecedented scale and for geographically distributed groups to work together in ways that were previously impossible (Foster, 2002). Job scheduling (Kazem, Rahmani and Aghdam, 2008), (Baghban and Rahmani 2008), (Adabi, Movaghar and Rahmani, 2014), resource discovery (Adabi, Movaghar, Rahmani and Beigy, 2012), (Navimipour, Rahmani, Navin and Hosseinzadeh, 2014), (Navin, Navimipour, Rahmani, and Hosseinzadeh, 2014), power management (Ouyang, Chiang, Hsu and Yi, 2014) and data replications (Yuhanis, 2014), (Grace, Manimegalai, 2014) are the most important challenges in data grid environments.
A data grid which is one of the various types of grids connects a collection of geographically distributed computers and storage resources (Rahmani, Fadaie, and Chronopoulos, 2015) that may be located in different parts of a country or even in different countries, and enables users to share data and other resources. In a distributed computing environment, various and different resources are available, such as: large volume data storage, supercomputers, video equipments, and so on (Wang, Chen, Deng, and Huang, 2011). Among all these resources, data grids (Souri, Rahmani, 2014) primarily deal with providing services and infrastructure for distributed data-intensive applications which often require access a large amount of data (terabytes or petabytes). Examples of such applications include high-energy physics, climate modeling, bioinformatics, brain activity analysis, image processing, earthquake engineering, astronomy and ray tracing (Amoon, 2012). Managing such huge and widely distributed amount of data and complex distributed environments in a centralized location is ineffective and increases the data access time and brings problems such as single point of failure and bottleneck. Also, for most users, it becomes more difficult and sometimes impossible to address all requirements on a single computing platform or for that matter in a single location (Wang, Chen, Deng, and Huang, 2011), (Wu, Lai, and Lai, 2012). Therefore, this huge amount of data should be replicated and distributed in several physical locations of distributed system to avoid such problems. Data grid retrieves data from closest grid site and replicates it for the requester site at the time of need.
As the study on data grid becoming more popular, techniques of replica management become increasing well into research (Yuhanis, 2014). Data replication is an effective method which supports the management of Virtual Organization (VO) storage (and perhaps also network and computing) resources to maximize data access performance with respect to metrics such as response time, reliability, and cost (Foster, Kesselman, and Tuecke, 2001). The major goal of this technique is to reduce the execution cost of a job, which depends not only on the computational resource assignment but also on the location of data files that are required by the job (Abdurrab, Xie, 2010). Replication process involves the creation of identical copies of data files (called replicas) and placing them over various locations of grid. In addition, if data can be kept close to users via replication, data access performance can be improved greatly (Rahman, Barker, and Alhajj, 2008). This can reduce data access latency, response time, bandwidth consumption, and increase data availability, reliability, fault tolerance, system scalability, load balancing, and robustness of grid applications.