Consistency of Replicated Datasets in Grid Computing
Gianni Pucciani (CERN, European Organization for Nuclear Research, Switzerland), Flavia Donno (CERN, European Organization for Nuclear Research, Switzerland), Andrea Domenici (University of Pisa, Italy) and Heinz Stockinger (Heinz Stockinger, Swiss Institute of Bioinforma)
Copyright: © 2009
Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several copies of a dataset are created and placed at different nodes, so that users can access the replica closest to them, and at the same time the data access load is distributed among the replicas. In today’s Grid middleware solutions, data management services allow users to replicate datasets (i.e., flat files or databases) among storage elements within a Grid, but replicas are often considered read-only because of the absence of mechanisms able to propagate updates and enforce replica consistency. This entry analyzes the replica consistency problem and provides hints for the development of a Replica Consistency Service, highlighting the main issues and pros and cons of several approaches.
Replica consistency is the property exhibited by a set of data items, such as files or databases located at different nodes of a Grid, that contain the same information; when these data items are modifiable, all of them should be updated (or synchronized) so that consistency is maintained. Replica consistency is a very well studied research topic and has its roots in distributed systems as well as in distributed database management systems, where it is sometimes referred to as external consistency (Cellary et al., 1988). Replica consistency is obviously related to data replication, a technique that is used pervasively in Grids to achieve fast data access, high availability, increased fault tolerance, and better load balancing. Data replication involves databases, files, and possibly other units of information, such as objects or records, and relies on the functions provided by “plain” file systems, storage systems, database management systems, and middleware services. Currently, existing Grids offer scarce support, if any, for data consistency. Often, data is considered to be read-only, i.e. data is consistent by definition since no updates are allowed on existing data items.
The rest of this entry presents an introduction to the problem in the Background section, where the key concepts are introduced. Furthermore, the data management capabilities provided by middleware services currently used in some of the largest Grids are reviewed, pointing out their approach to replica management and support for replica synchronization. The core analysis of the problem is presented in the Main Focus section, where the main issues in the development of a Replica Consistency Service for Data Grids are discussed.
Key Terms in this Chapter
Physical File Name: The name of a replicated file which defines its location.
Data Replication: Having and managing more copies of datasets. These copies are typically synchronized.
Heterogeneous Database Synchronization: Used to enforce consistency among replicated databases of different vendors.
Strict Synchronization: Updating all the replicas of the same dataset in a single transaction to make sure that replicas are never outdated.
Replica Synchronization: The task of updating replicas in order to enforce their consistency.
Replica Catalogue: Used to locate replicas (physical locations) which are mapped to logical file names.
Replica Management System: A Grid service that takes care of replicating datasets and keeping track of locations in a Replica Catalogue.
Lazy Synchronization: Allowing for certain delays in the update process, i.e. replicas can be outdated for a certain time.
Logical File Name: A name used to identify a set of replicated files.
Replica Consistency: The property exhibited by a set of replicas that contain the same information.
Complete Chapter List
Emmanuel Udoh, Frank Zhigang Wang
Emmanuel Udoh, Frank Zhigang Wang, Vineet R. Khare
Enis Afgan, Purushotham Bangalore
Kuo-Chan Huang, Po-Chi Shih, Yeh-Ching Chung
Gianni Pucciani, Flavia Donno, Andrea Domenici, Heinz Stockinger
Ming Wu, Xian-He Sun
Zhihui Du, Zhili Cheng, Xiaoying Wang, Chuang Lin
Kris Bubendorfer, Ben Palmer, Ian Welch
Sandro Fiore, Alessandro Negro, Salvatore Vadacca, Massimo Cafaro, Giovanni Aloisio, Roberto Barbera
Man Wang, Zhihui Du, Zhili Cheng
Vineet R. Khare, Frank Zhigang Wang
Yuhui Deng, Frank Zhigang Wang, Na Helian
Dominic Cherry, Maozhen Li, Man Qi
Maozhen Li, Man Qi, Bin Yu
Irfan Habib, Ashiq Anjum, Richard McClatchey
Kurt Vanmechelen, Jan Broeckhove, Wim Depoorter, Khalid Abdelkader
Rosario M. Piro
Frans Arickx, Jan Broeckhove, Peter Hellinckx, David Dewolfs, Kurt Vanmechelen
Gabriel Aparicio, Fernando Blanco, Ignacio Blanquer, César Bonavides, Juan Luis Chaves, Miguel Embid, Álvaro Hernández
Gerald Schaefer, Roger Tait
Daniele Andreotti, Armando Fella, Eleonora Luppi
Roberto Barbera, Valeria Ardizzone, Leandro Ciuffo
Dirk Gorissen, Tom Dhaene, Piet Demeester, Jan Broeckhove
Gokop Goteng, Ashutosh Tiwari, Rajkumar Roy
Hai Jin, Li Qi, Jie Dai, Yaqin Luo