In the early 1990s, Ian Foster and Carl Kesselman came up with the idea of “The Grid”; applying the same concept of plugging into a grid for metered utility service to computing (Wallis, 2008; Douglis, 2009). This concept has transformed into what has been coined Cloud Computing. With Cloud Computing, data is stored in the “cloud” of the internet where web-based applications are utilized to access the data and perform various tasks. Amazon was one of the first to provide service, the Elastic Compute Cloud (EC2). EC2 charged for their resources by the hour, like the electric company charges their clients per kilowatts an hour of usage. Presently, there are three major categories of Cloud Computing systems offered: software-as-a-service (SaaS), platform-as-a-service (PaaS), and infrastructure-as-a-service (IaaS) (Viega, 2009). Each of the aforementioned services as well as the emerging Database-as-a-Service provides customers a variety of choices to meet their specific needs.
1.1. Comparing and Contrasting Grid and Cloud Computing
On the outside there is little difference between a Grid and a Cloud environment. They both are a collection of machines such as servers, network devices, and computers that, to the user, appear as a single resource. This resource provides services such as Web infrastructure, databases, application support and more. So what truly sets these two environments apart?
Grids were originally designed to help provide a way for researchers to allocate large amounts of resources in order to perform some complex computation that would take a significant amount of time longer to accomplish with a single machine. In the Grid, scientists could allocate as many nodes in the environment as necessary to complete their task, and leave the remaining nodes available for other such computations. The nodes are allocated only for as long as it takes to complete the task. Once the task is completed, the nodes are leased back into the Grid to be allocated to another process.
In contrast to Grids, Cloud environments are geared towards smaller requests for resources. Instead of requesting 2000 nodes in order to perform some difficult task, one might request enough resources in order to meet their individual, business, or personal, needs. The resources in a Cloud are more semi-permanent as opposed to a Grid. In other words, the allocated resources in a Cloud will remain persistent until the client decides to cancel the service.
Another difference is that the environment in a Cloud will generally be a virtual environment. What this means is that if somebody were to request a resource such as a Web server, they would not be given an entire server in which to host their environment. Instead they would be allocated a virtual environment, there could be several such environments running on each physical server, where the user’s Web server would run.
While research on Grid systems has been expanding on working towards being able to tie resources from different administrative domains, Cloud Computing systems are currently restricted to a single domain.
An additional key difference in cloud computing is how data is stored and distributed. Relational databases, as typically used in an Enterprise environment, are not very scalable because the data cannot easily be distributed across the domain. A relational database takes a huge performance hit if the files are distributed across a large number of nodes. If the workload doubles or triples overnight, you would have to upgrade the server quickly, which is not an easy task.
By using a distributed hash table you can distribute your data across thousands of nodes without having to have one or even several centralized database servers. Not only is this good for redundancy, it also allows for faster insertion and retrieval for things such as Web services.