Article Preview
TopIntroduction
With the increasing globalization of contemporary business organizations, distributed databases and their management have become one of the key areas in database research. A distributed database is a single logical database scattered across multiple computers. Basically, there are two options for distributing a database: data partitionaing or data replication. Data replication is one of the important decisions in organizations (Km & Eom, 2016). It refers to the creation of identical copies of data (replicas). Data partitioning is another strategy for distributing a database that breaks a table into multiple records (horizontal partitioning) or multiple columns (vertical partoitioning).
This paper presents a survey of data replication strategies in cloud systems. Data replication improves data availability, response time, fault tolerance, and reduces network traffic. It is frequently used in: (i) DBMS (Pérez, García-Carballeira, Carretero, Calderón, & Fernández, 2010), (ii) parallel and distributed systems (Loukopoulos, Lampsas, & Ahmad, 2005; Benoit, Rehn-Sonigo, & Robert, 2008), (iii) mobile systems (Tos, Mokadem, Hameurlain, Ayav, & Bora, 2016) and (vi) large scale systems, including P2P(Xhafa, Kolici, Potlog, Spaho, Barolli, & Takizawa, 2012)and data Grid systems (Mansouri, Azad, & Chamkori, 2014). Many replication strategies proposed aim to answer the following questions:
- •
What data should be replicated?
- •
When should the data be replicated
- •
Where the new replicas should be placed?
Data replication is a necessary tool for effectively managing a database in distributed database environments. Most of the works in the literature have classified replication strategies based on the following criteria: (i) static vs. dynamic classification (Chervenak, Deelman, Foster et al., 2002; Čibej, Slivnik, & Robič, 2005), (ii) centralized vs. decentralized replication (Sashi & Thanamani, 2011; Amjad, Sher, & Daud, 2012; Grace & Manimegalai, 2014), (iii) server vs. client replication (Doğan, 2009; Steen & Pierre, 2010), (iv) objective function based classification (Mokadem & Hameurlain, 2015), and (v) system architecture based classification (Tos, Mokadem, Hameurlain et al., 2015). However, the existing replication strategies are not adapted to the cloud system. They aim to obtain the best performance without taking the profit of cloud providers or the satisfaction of tenant requirements into account. Creating as many replicas in clouds may not be economically feasible. Hence, replication strategies in such environments should also ensure both a tenant Quality of Service (QoS) and the economic profitability of the provider.
This paper presents a survey of data replication strategies in cloud systems. We propose another classification of replication strategies based on the following five dimensions:
- •
Static vs. dynamic (Ghemawat, Gobioff, & Leung, 2003; Bai, Jin, Liao et al., 2013);
- •
Reactive vs. proactive workload balancing (Silvestre, Monnet, Krishnaswamy et al., 2012; Hussein & Mousa, 2014);
- •
Provider-centric vs. customer-centric (Sakr & Liu, 2012; Sousa & Machado, 2012);
- •
Minimal blocking probability (Xue, Shen, & Guo, 2015) and energy efficiency and bandwidth consumption (Boru, Kliazovich, Granelli et al., 2015);
- •
Objective function (Bonvin, Papaioannou, & Aberer, 2011; Kirubakaran, Valarmathy, & Kamalanathan, 2013; Tos et al., 2016).