Resource Provisioning and Scheduling of Big Data Processing Jobs

Resource Provisioning and Scheduling of Big Data Processing Jobs

Rajni Aron (Sahyadri College of Engineering and Management, India) and Deepak Kumar Aggarwal (Concordia University, Canada)
DOI: 10.4018/978-1-5225-3142-5.ch014

Abstract

Cloud Computing has become a buzzword in the IT industry. Cloud Computing which provides inexpensive computing resources on the pay-as-you-go basis is promptly gaining momentum as a substitute for traditional Information Technology (IT) based organizations. Therefore, the increased utilization of Clouds makes an execution of Big Data processing jobs a vital research area. As more and more users have started to store/process their real-time data in Cloud environments, Resource Provisioning and Scheduling of Big Data processing jobs becomes a key element of consideration for efficient execution of Big Data applications. This chapter discusses the fundamental concepts supporting Cloud Computing & Big Data terms and the relationship between them. This chapter will help researchers find the important characteristics of Cloud Resource Management Systems to handle Big Data processing jobs and will also help to select the most suitable technique for processing Big Data jobs in Cloud Computing environment.
Chapter Preview
Top

Introduction

The year 2007 witnessed the advent of a new term Cloud Computing which has become a buzzword now in the IT industry. The market now has plenty of Cloud technologies and platforms like Google App Engine, Microsoft Azure, Manjrasoft Aneka and many more to fit in this slot. Cloud Computing which provides inexpensive computing resources on the pay-as-you-go basis is promptly gaining momentum as a substitute for traditional information technology (IT) based organizations. Therefore, the increased utilization of Clouds makes an execution of Big Data processing jobs a vital research area.

As more and more users have started to store/process their real time data in Cloud environments, resource provisioning and scheduling of Big Data processing jobs becomes a key element of consideration for efficient execution of Big Data applications. The base of any real time system is a resource and to manage the resources to handle Big Data jobs in Cloud Computing environment is a very tedious task. An inefficient resource management system can have a direct negative effect on performance cost and indirect effect on functionality of the system. Indeed, some functions provided by the system may become too expensive or may be avoided due to poor performance. Thus, Cloud Computing faces the challenge of resource management especially with respect to choosing resource provisioning strategies and suitable algorithms for particular application. The major components of resource management systems are: resource provisioning, scheduling and load balancing. If any system is able to fulfil the requirements of these three components, the processing of Big Data jobs will become much easier.

This chapter discusses the fundamental concepts supporting Cloud Computing & Big Data terms and the relationship between them. It examines both computing paradigms under varied contexts such as the underlying characteristics, requirements, types, challenges and solutions. Comparison of different cloud service providers is done for various application paradigms. It reflects the essential perceptions behind the resource provisioning in the Cloud and identifies requirements based on user’s applications associated with handling real time data. An architecture for dynamic provisioning of resources based on user’s requirements to maximize efficiency and analysis of Big Data processing jobs is also discussed. This chapter will help researchers find the important characteristics of cloud resource management systems to handle Big Data processing jobs and will also help to select the most suitable technique for processing Big Data jobs in Cloud computing environment along with significant future research directions.

The remainder of this book chapter is organized as follows: In the section “Relationship between Cloud Computing and Big Data”, a relationship between cloud computing and Big Data is presented. The section “Resource Provisioning for Big Data” discusses resource provisioning with Big Data in detail including types of resource provisioning techniques, requirements and related work in this area. Scheduling of Big Data based application is reviewed in the section “Scheduling for Big Data” with types of scheduling, characteristics of scheduling and scheduling in IaaS. The section “Load Balancing for Big Data” presents load balancing techniques for Big Data based applications. The section “Open issues and Challenges” presents future research directions related with resource provisioning, scheduling and load balancing for Big Data related problems in cloud computing environment. The section “Conclusion” discusses the final conclusion of the chapter.

Key Terms in this Chapter

Distributed Computing: Distributed Computing is a model in which components of a software system are shared among multiple computers to improve efficiency and performance (Distributed Computing, 2015 AU126: The in-text citation "Distributed Computing, 2015" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Scalability: Scalability is a characteristic of a system, model or function that describes its capability to cope and perform under an increased or expanding workload. A system that scales well will be able to maintain or even increase its level of performance or efficiency when tested by larger operational demands (Scalability, 2015 AU128: The in-text citation "Scalability, 2015" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Virtualization: Virtualization is the process of creating a software based (or virtual) representation of something rather than a physical one. Virtualization can apply to applications, servers, storage and networks and is the single most effective way to reduce IT expenses while boosting efficiency and agility for all size businesses (Vmware, 2016 AU129: The in-text citation "Vmware, 2016" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Resource Monitoring: The act of collecting information about the resources, their current, past and future status.

Cloud Resource Management: It can be explained as the process of allocating computing, storage and networking resources to a set of applications in a manner that intends to fulfil the performance objectives of the applications, the data center providers and the cloud resource users (Brendan, 2014 AU124: The in-text citation "Brendan, 2014" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Data Centers: Data centers are physical or virtual infrastructure used by enterprises to house computer, server and networking systems and components for the company's Information Technology (IT) needs which typically involve storing, processing and serving large amounts of mission critical data to clients in a client/server architecture (Webopedia, 2016 AU125: The in-text citation "Webopedia, 2016" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Public Clouds: A public cloud in which a company relies on a third party cloud service provider for services such as servers, data storage and applications which are delivered to the company through the Internet. A public cloud can free companies from the potentially expensive costs of having to purchase, manage and maintain on-premises hardware and software infrastructure (Webopedia, 2015 AU127: The in-text citation "Webopedia, 2015" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Complete Chapter List

Search this Book:
Reset