Data-Aware Distributed Batch Scheduling

Data-Aware Distributed Batch Scheduling

Tevfik Kosar (University at Buffalo, USA)
DOI: 10.4018/978-1-60566-184-1.ch005
OnDemand PDF Download:


As the data requirements of scientific distributed applications increase, the access to remote data becomes the main performance bottleneck for these applications. Traditional distributed computing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. The insufficiency of the traditional systems and existing CPU-oriented schedulers in dealing with the complex data handling problem has yielded a new emerging era: the data-aware schedulers. This chapter discusses the challenges in this area as well as future trends, with a focus on Stork case study.
Chapter Preview


Modern scientific applications and experiments become increasingly data intensive. Large experiments, such as high-energy physics simulations, genome mapping, and climate modeling generate data volumes reaching hundreds of terabytes (Hey, 2003). Similarly, data collected from remote sensors and satellites are also producing extremely large amounts of data for scientists (Tummala and Kosar, 2007; Ceyhan & Kosar, 2007). In order to process these data, scientists are turning towards distributed resources owned by the collaborating parties to provide them the computing power and storage capacity needed to push their research forward. But the use of distributed resources imposes new challenges (Kosar, 2006). Even simply sharing and disseminating subsets of the data to the scientists’ home institutions is difficult. The systems managing these resources must provide robust scheduling and allocation of storage resources, as well as the efficient management of data movement.

One key benefit of distributed resources is that it allows institutions and organizations to gain access to resources needed for large-scale applications that they would not otherwise have. But in order to facilitate the sharing of compute, storage, and network resources between collaborating parties, middleware is needed for planning, scheduling, and management of the tasks as well as the resources. The majority of existing research has been on the management of compute tasks and resources, as they are widely considered to be the most expensive. As scientific applications become more data intensive, however, the management of storage resources and data movement between the storage and compute resources is becoming the main bottleneck. Many jobs executing in distributed environments are failed or are inhibited by overloaded storage servers. These failures prevent scientists from making progress in their research.

According to the ‘Strategic Plan for the US Climate Change Science Program (CCSP)’, one of the main objectives of the future research programs should be “Enhancing the data management infrastructure”, since “The users should be able to focus their attention on the information content of the data, rather than how to discover, access, and use it.” (CCSP, 2003). This statement by CCSP summarizes the goal of many cyberinfrastructure efforts initiated by DOE, NSF and other federal agencies, as well the research direction of several leading academic institutions.

NSF’s ‘Cyberinfrastructure Vision for 21st Century’ states that “The national data framework must provide for reliable preservation, access, analysis, interoperability, and data movement” (NSF, 2006). The same report also says: “NSF will ensure that its efforts take advantage of innovation in large data management and distribution activities sponsored by other agencies and international efforts as well.” According to the NSF report on ‘Research Challenges in Distributed Computing Systems’, “Data storage is a fundamental challenge for large-scale distributed systems, and advances in storage research promise to enable a range of new high-impact applications and capabilities” (NSF, 2005).

It would not be too bold to claim that the research and development in the computation-oriented distributed computing has reached its maturity, and now there is an obvious shift of focus towards data–oriented distributed computing. This is mainly due to the fact that existing solutions work very well for computationally-intense applications, but inadequately address applications which access, create, and move large amounts of data over wide-area networks.

Key Terms in this Chapter

Condor: It is a batch scheduling system for computational tasks. It provides a job queuing mechanism and resource monitoring capabilities. It allows the users to specify scheduling policies and enforce priorities.

Stork: It is a specialized scheduler for data placement activities in heterogeneous environments. Stork can queue, schedule, monitor, and manage data placement jobs and ensure that the jobs complete.

Condor-G: It is an extension of Condor, which allows users to submit their jobs to inter-domain resources by using the Globus Toolkit functionality. In this way, user jobs can get scheduled and run not only on Condor resources but also on PBS, LSF, LoadLeveler, and other grid resources.

Distributed Computing: It is a type of parallel computing where different parts of the same application can run on more than one geographically distributed computers.

Batch Scheduling: Scheduling and execution of a series of jobs in the background “batch” mode, without any human interaction.

Data Placement: It encompasses all data movement related activities such as transfer, staging, replication, space allocation and de-allocation, registering and unregistering metadata, locating and retrieving data.

DAGMan: It manages dependencies between tasks in a Directed Acyclic Graph (DAG), whrere tasks are represented as nodes and the dependencies between tasks are represented as directed arcs between the respective nodes.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
Ruth E. Shaw
Emmanuel Udoh, Frank Zhigang Wang
Emmanuel Udoh
Chapter 1
Emmanuel Udoh, Frank Zhigang Wang, Vineet R. Khare
This chapter presents a historical record of the advent of Grid with a recourse to some basic definitions commonly accepted by most researchers. It... Sample PDF
Overview of Grid Computing
Chapter 2
Eric Aubanel
The problem of load balancing parallel applications is particularly challenging on computational grids, since the characteristics of both the... Sample PDF
Resource-Aware Load Balancing of Parallel Applications
Chapter 3
Enis Afgan, Purushotham Bangalore
Grid computing has emerged as the next generation computing platform. Because of the resource heterogeneity that exists in the grid environment... Sample PDF
Assisting Efficient Job Planning and Scheduling in the Grid
Chapter 4
Kuo-Chan Huang, Po-Chi Shih, Yeh-Ching Chung
Most current grid environments are established through collaboration among a group of participating sites which volunteer to provide free computing... Sample PDF
Effective Resource Allocation and Job Scheduling Mechanisms for Load Sharing in a Computational Grid
Chapter 5
Tevfik Kosar
As the data requirements of scientific distributed applications increase, the access to remote data becomes the main performance bottleneck for... Sample PDF
Data-Aware Distributed Batch Scheduling
Chapter 6
Gianni Pucciani, Flavia Donno, Andrea Domenici, Heinz Stockinger
Data replication is a well-known technique used in distributed systems in order to improve fault tolerance and make data access faster. Several... Sample PDF
Consistency of Replicated Datasets in Grid Computing
Chapter 7
Ming Wu, Xian-He Sun
Rapid advancement of communication technology has changed the landscape of computing. New models of computing, such as business-on-demand, Web... Sample PDF
Quality of Service of Grid Computing
Chapter 8
QoS in Grid Computing  (pages 75-83)
Zhihui Du, Zhili Cheng, Xiaoying Wang, Chuang Lin
This chapter first summarizes popular terms of QoS related concepts and technologies in grid computing, including SLA, End-to-End QoS Provision and... Sample PDF
QoS in Grid Computing
Chapter 9
Kris Bubendorfer, Ben Palmer, Ian Welch
A Grid resource broker is the arbiter for access to a Grid’s computational resources and therefore its performance and functionality has a... Sample PDF
Trust and Privacy in Grid Resource Auctions
Chapter 10
Sandro Fiore, Alessandro Negro, Salvatore Vadacca, Massimo Cafaro, Giovanni Aloisio, Roberto Barbera
Grid computing is an emerging and enabling technology allowing organizations to easily share, integrate and manage resources in a distributed... Sample PDF
An Architectural Overview of the GRelC Data Access Service
Chapter 11
Man Wang, Zhihui Du, Zhili Cheng
Resource Management System (RMS), which manages the Grid resources and matches the applications’ requests to the proper resources, is one of the... Sample PDF
Adaptive Resource Management in Grid Environment
Chapter 12
Vineet R. Khare, Frank Zhigang Wang
The need for a dynamic and scalable expansion of the grid infrastructure and resources and other scalability issues in terms of execution efficiency... Sample PDF
Bio-Inspired Grid Resource Management
Chapter 13
Yuhui Deng, Frank Zhigang Wang, Na Helian
Storage Grid is a new model for deploying and managing the heterogeneous, dynamic, large-scale, and geographically distributed storage resources.... Sample PDF
Service Oriented Storage System Grid
Chapter 14
Dominic Cherry, Maozhen Li, Man Qi
This chapter presents MediaGrid, a distributed storage system for archiving broadcast media contents. MediaGrid utilizes storage resources donated... Sample PDF
A Distributed Storage System for Archiving Broadcast Media Content
Chapter 15
Maozhen Li, Man Qi, Bin Yu
The computational grid is rapidly evolving into a service-oriented computing infrastructure that facilitates resource sharing and large-scale... Sample PDF
Service Discovery with Rough Sets
Chapter 16
Irfan Habib, Ashiq Anjum, Richard McClatchey
Due to some barriers to adoption we have not seen a proliferation of Grid Computing technologies throughout e-Science or other domains. This chapter... Sample PDF
On the Pervasive Adoption of Grid Technologies: A Grid Operating System
Chapter 17
Kurt Vanmechelen, Jan Broeckhove, Wim Depoorter, Khalid Abdelkader
As grid computing technology moves further up the adoption curve, the issues of dealing with conflicting user requirements formulated by different... Sample PDF
Pricing Computational Resources in Grid Economies
Chapter 18
Rosario M. Piro
Large, geographically distributed and heterogeneous computing infrastructures, such as the Grid, often span multiple organizations and... Sample PDF
Resource Usage Accounting in Grid Computing
Chapter 19
Frans Arickx, Jan Broeckhove, Peter Hellinckx, David Dewolfs, Kurt Vanmechelen
Quantum structure or scattering calculations often belong to a class of computational problems involving the aggregation of a set of matrices... Sample PDF
Grid-Based Nuclear Physics Applications
Chapter 20
Gabriel Aparicio, Fernando Blanco, Ignacio Blanquer, César Bonavides, Juan Luis Chaves, Miguel Embid, Álvaro Hernández
In the last years an increasing demand for Grid Infrastructures has resulted in several international collaborations. This is the case of the EELA... Sample PDF
Developing Biomedical Applications in the Framework of EELA
Chapter 21
Gerald Schaefer, Roger Tait
Efficient approaches to computationally intensive image processing tasks are currently highly sought after. In this chapter, the authors show how a... Sample PDF
Distributed Image Processing on a Blackboard System
Chapter 22
Daniele Andreotti, Armando Fella, Eleonora Luppi
The BaBar experiment uses data since 1999 in examining the violation of charge and parity (CP) symmetry in the field of high energy physics. This... Sample PDF
Simulated Events Production on the Grid for the BaBar Experiment
Chapter 23
Diego Liberati
A framework is proposed that creates, uses, and communicates information, whose organizational dynamics allows performing a distributed cooperative... Sample PDF
A Framework for Semantic Grid in E-Science
Chapter 24
Roberto Barbera, Valeria Ardizzone, Leandro Ciuffo
The Grid INFN virtual Laboratory for Dissemination Activities (GILDA) is a fully working Grid test-bed devoted to training and dissemination... Sample PDF
Grid INFN Virtual Laboratory for Dissemination Activities (GILDA)
Chapter 25
Dirk Gorissen, Tom Dhaene, Piet Demeester, Jan Broeckhove
The simulation and optimization of complex systems is a very time consuming and computationally intensive task. Therefore, global surrogate modeling... Sample PDF
Grid Enabled Surrogate Modeling
Chapter 26
Patrik Skogster
Grid computing is becoming as essential part of different business analysis. In traditional business computing infrastructures data transfer occurs... Sample PDF
GIS Grids and the Business Use of GIS Data
Chapter 27
Gokop Goteng, Ashutosh Tiwari, Rajkumar Roy
The emerging grid technology provides a secured platform for multidisciplinary experts in the security intelligence profession to collaborate and... Sample PDF
Grid Computing: Combating Global Terrorism with the World Wide Grid
Chapter 28
Salvatore Scifo
This chapter focuses on the efforts to design and develop a standard pure Java API to access the metadata service of the EGEE Grid middleware, and... Sample PDF
Accessing Grid Metadata through a Web Interface
Chapter 29
Jyotsna Sharma
Efforts in Grid Computing, both in academia and industry, continue to grow rapidly worldwide for research, scientific and commercial purposes.... Sample PDF
Grid Computing Initiatives in India
Chapter 30
Hai Jin, Li Qi, Jie Dai, Yaqin Luo
A grid system is usually composed of thousands of nodes which are broadly distributed in different virtual organizations. Owing to geographical... Sample PDF
Dynamic Maintenance in ChinaGrid Support Platform
About the Contributors