Data Intensive Cloud Computing: Issues and Challenges

Data Intensive Cloud Computing: Issues and Challenges

Jayalakshmi D. S., R. Srinivasan, K. G. Srinivasa
Copyright: © 2016 |Pages: 16
DOI: 10.4018/978-1-4666-9840-6.ch029
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Processing Big Data is a huge challenge for today's technology. There is a need to find, apply and analyze new ways of computing to make use of the Big Data so as to derive business and scientific value from it. Cloud computing with its promise of seemingly infinite computing resources is seen as the solution to this problem. Data Intensive computing on cloud builds upon the already mature parallel and distributed computing technologies such HPC, grid and cluster computing. However, handling Big Data in the cloud presents its own challenges. In this chapter, we analyze issues specific to data intensive cloud computing and provides a study on available solutions in programming models, data distribution and replication, resource provisioning and scheduling with reference to data intensive applications in cloud. Future directions for further research enabling data intensive cloud applications in cloud environment are identified.
Chapter Preview
Top

Introduction

Massive amounts of data are being generated in scientific, business, social network, healthcare, and government domains. The “Big Data” so generated is typically characterized by the three Vs: Volume, Variety, and Velocity. Big data comes in large volumes, from a large number of domains, and in different formats. Data can be in structured, semi-structured or unstructured format, though most of the Big Data is unstructured; the data sets might also grow in size rapidly. There are many opportunities to utilize and analyze the Big Data to derive value for business, scientific and user-experience applications. These applications need to process data in the range of many terabytes or petabytes and are called data intensive applications. Consequently, computing systems which are capable of storing, and manipulating massive amounts of data are required; also required are related software systems and algorithms to analyze the big data so as to derive useful information and knowledge in a timely manner.

In this chapter we present the characteristics of data intensive applications in general and discuss the requirements of data intensive computing systems. Further, we identify the challenges and research issues in implementing data intensive computing systems in cloud computing environment. Later in this chapter, we also present a study on programming models, data distribution and replication, resource provisioning and scheduling with reference to data intensive applications in cloud.

Data Intensive Computing Systems

Data Intensive Computing is defined as “a class of parallel computing applications which use a data parallel approach to processing large volumes of data” (”Data Intensive Computing”, 2012). They devote most of their processing time to I/O and manipulation of data rather than computation (Middleton, 2010). According to the National Science Foundation, data intensive computing requires a “fundamentally different set of principles'' to other computing approaches. There are several important common characteristics of data intensive computing systems that distinguish them from other forms of computing(Middleton, 2010).

  • Data and applications or algorithms are co-located so that data movement is minimized to achieve high performance in data intensive computing

  • Programming models that express the high level operations on data such as data flows are used, and the runtime system transparently controls the scheduling, execution, load balancing, communications and movement of computation and data across the distributed computing cluster.

  • They provide reliability, availability and fault tolerance.

  • They are linearly scalable to handle large volumes of data.

Complete Chapter List

Search this Book:
Reset