Cloud Computing and Big Data

Cloud Computing and Big Data

DOI: 10.4018/978-1-4666-4683-4.ch006


This chapter aims at exploring the intersection of cloud computing with big data. The big data analysis, mining, and privacy concerns are discussed. First, this chapter deals with the software framework, MapReduce™ that is commonly used for performing Big Data Analysis in the clouds. In addition, some of the most used techniques for performing Big Data Mining are detailed. For instance, Clustering, Co-Clustering, and Association Rules are described in detail. In particular, the k-center problem is described while with reference to the association rules beyond the basic definitions, the Apriori Algorithm is outlined and illustrated by some numerical examples. These techniques are also described with reference to their versions based on MapReduce. Finally, the description of some real applications conclude the chapter.
Chapter Preview

1. Introduction

Big data has become a buzzword like the situation of cloud computing a few years ago. The term Big Data denotes a large data set that is a data set with size greater than the capacity of the traditional databases. The large data sets represent a rich source of information. This data comes from everywhere, in particular from the Internet: sensors used to gather climate information, posts to social networks and social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. The sheer amount of data that is being collected by companies around the world is astonishing especially for business purposes, thus they have a high economic impact.

In addition, a huge amount of big data is generated from research in biology, medicine, and astrophysics to name a few. In 2010, Avanade®, a global business technology solutions and managed services provider (, published the results of a research survey on the business impact of big data. Figure 1 summarizes the top sources of data and highlights that e-mail with 72% is the major source of big data. It is worth nothing that the 543 involved respondents and IT decision-makers (from 17 countries across North America, Europe and Asia Pacific) were allowed to select up to 3 choices.

Figure 1.

Main big data sources. Source: (Avanade, 2010).

Figure 2 plots the results of the same survey with reference to the top big data producers. Management represents the top producer of big data that mostly concerns information on customers, on products, on services, and on activities. The big data stream generated from the public sector is instead expected to increase in the foreseeable future.

Figure 2.

Main big data producers. Source: (Avanade, 2010).

However, data in its raw form cannot increase knowledge. It needs to be properly processed in order to extract the relevant information such as structured data, and to acquire knowledge. For instance, the raw data generated from industry has to be properly analyzed by the managers in order to get the relevant information to forecast market, and to react to the customer needs quickly. Thus, the companies reach higher service levels, and as a result they are more competitive.

The higher the data availability is, the higher the quality analysis would be. For instance, in some simulation based applications, the quantity of inputs strongly affect the outcome quality. Some effective tools can be adopted to process the raw data and to consequently extract information. Thus, the availability of a huge amount of data is seen as a great advantage. With reference to the companies interested in making market forecasts the accuracy of the forecasting methods strongly depends on the quantity of the historical data.

Few companies besides the very biggest have been able to successfully mine their data resources, but this is a situation that is rapidly changing. As anticipation of big data opportunities grows, businesses today feel that they can no longer afford to do nothing; now is the time to act if they are not to be left behind.

Most organizations are still in the early stages, and few have thought through an enterprise approach or realized the profound impact that big data will have on their infrastructure, organizations and industries. Companies can no longer afford to ignore the opportunities that simply cannot be met with the traditional data streams and practices. Meanwhile, companies feel forced to act due to the never-ending media hype around big data.

Complete Chapter List

Search this Book: