Article Preview
TopIntroduction
The central promise of cloud services, as opposed to traditional application deployment with fixed assignments of applications to execution resources, is elasticity (Mell & Grance, 2011) and thus flexibility: as demand for specific services grows or shrinks, appropriate resources can be allocated to optimize efficient delivery of all services. For stateless services, this requires, essentially, to start appropriate software on appropriate machines and to adapt network and request routing. For “stateful” services, however, in particular for database services, growing and shrinking the set of servers dedicated to a specific service requires re-partitioning of those data and moving them as appropriate.
Moving a large data set is slow. For example, a modern disk drive with 2 TB storage capacity and transfer bandwidth of 200 MB/s requires 10,000 seconds or about 3 hours just to read all data. A service for which adding or removing nodes from a service takes hours provides only an unsatisfactory degree of elasticity, in particular if competitors, using different techniques, can add or drop nodes in minutes.
We believe that the speed of elastic adaption is a crucial quality of a cloud service. We further believe that solving a minimized prototypical instance of the problem – adding a single node to a single existing node and bringing the new node up-to-date efficiently – is sufficient to explore alternative techniques. Therefore, this minimized instance of this elasticity is the major first step in this paper. Ultimately, once a single node can be added to a cloud database system, scaling the whole system from a small-sized instance to a medium-sized instance to a large or extra large sized instance can be accomplished. However, optimization techniques for large-scale growing or shrinking of the system by many nodes at a time are not further investigated and may be considered in the future.
Our contribution is a technique that permits bringing a new node up-to-date in small, useful steps. In other words, the new node can become useful for query processing (and thus aid scalability of the cloud service) almost instantly and incrementally during the update process. Moreover, the overall duration of the update process is similar to that of traditional log shipping techniques, which bring a new node up-to-date all at once but may take hours. Finally, we describe query optimization techniques, specifically query execution plans and their cost functions, for nodes that are partially updated and continue incremental updates between query optimization and query execution as well as during query execution.
The remainder of this paper is structured as follows. The next section revisits prior work and gives an overview of the relevant techniques, which are combined in novel ways in our solution. The following section states the general problem and explains how it is reduced to a minimal scenario, followed by another section that illustrates our approach to support function shipping and data shipping to a single node that joins a cluster of a database system. The penultimate section explains how the solution for the single-node scenario can be adapted to the general case. We also show several benefits that can be reaped from the elasticity thus achieved. The last section gives conclusions.