Grid Scheduling: Methods, Algorithms, and Optimization Techniques

Grid Scheduling: Methods, Algorithms, and Optimization Techniques

Florin Pop (University Politehnica of Bucharest, Romania)
Copyright: © 2012 |Pages: 26
DOI: 10.4018/978-1-61350-113-9.ch004
OnDemand PDF Download:


This chapter will present the scheduling mechanism in distributed systems with direct application in grids. The resource heterogeneity, the size and number of tasks, the variety of policies, and the high number of constraints are some of the main characteristics that contribute to this complexity. The necessity of scheduling in grid is sustained by the increasing of number of users and applications. The design of scheduling algorithms for a heterogeneous computing system interconnected with an arbitrary communication network is one of the actual concerns in distributed system research. The main concerns presented in the chapter refers to general presentation of scheduling for grid systems, specific requirements of scheduling in grids, critical analysis of existing methods and algorithms for grid schedulers, scheduling policies, fault tolerance in scheduling process in grid environments, scheduling models and algorithms and optimization techniques for grid scheduling.
Chapter Preview


Due to grid systems characteristics, the main issue for grid scheduling is to develop a Meta-Scheduling architecture that encompasses heterogeneous and dynamic clusters. This architecture is a decentralized one and represents the solution for the scheduling problem at a global grid level. At this level, the Quality-of-Service (QoS) constraint is very important. The scheduling methods for decentralized heterogeneous environment are based on heuristics that consider complex applications. The tasks that compose these applications can have different dimensions and can be based on diverse data and control patterns.

The specific requirements of scheduling in distributed systems are: the claims of the resource consumers, the restrictions imposed by the resource owners, the need to continuously adapt to changes in the availability of resources, etc. Based on these requirements, a number of challenging issues that must be addressed must be considered: maximization of system throughput, sites’ autonomy, scalability, fault-tolerance, and quality of services. On the other hand, the motivation for dynamic scheduling is then presented. The basic idea is to perform task allocation on the fly as the application executes. This is useful when it is difficult, if not impossible, to predict the execution time, the branch selected in a decision, and the number of iterations in a loop. So, dynamic scheduling is usually applied when it is difficult to estimate the cost of applications, or when jobs are coming at unpredictable times. The two major functions used in dynamic task scheduling are described, namely the system state estimation (different from the cost estimation in static scheduling) and the decision making.

A large number of tools are available for local grid scheduling: PBS, Condor, Sun Grid Engine, and LSF. These tools are included in the category of centralized schedulers. Instead, the meta-schedulers are the subject of projects under development, like GridWay (that is an incubator project in Globus) and Globus CSF. There is no meta-scheduler accepted and used on a large scale. A problem that must be solved for this type of scheduling is the scalability. This aspect is more important in the context of heterogeneous systems (that require a simultaneous management of multiple clusters) and of the diversity of middleware tools.

Large distributed systems with many different administrative domains will most likely have different resource utilization policies. Thus it is unlikely that a fixed scheduling policy will suffice for different needs. The system and application oriented policies for dynamic scheduling are important for tasks with dependencies as well. The system oriented policies need monitoring information for applications scheduling and execution in grids. This section will describe the policies which consider Quality-of-Services constrains. QoS is a requirement for many grid applications. QoS might refer to the response time, the necessary memory, etc. It might happen that these requirements are satisfied only by specific resources, so that they only these resources can be assigned for that application. Situations might become more complex when there are more tasks having QoS requirements, and several resources exist which satisfy them. The resource allocation under QoS constrains is another subject for the optimization process. Other type of policy for scheduling in grid computing must also take into account additional issues such as the resource owners' requirements, the need to continuously adapt to changes in the availability of resources, and so on. In these cases, a number of challenging issues need to be addressed: maximization of system throughput and user satisfaction, the sites' autonomy (the grid is composed of resources owned by different users, which retain control over them), and scalability.

The fault tolerance is also important in grid. Two of the problems related to re-scheduling are the high cost and the lack of coping with dependent tasks. For computational intensive tasks, re-scheduling the original schedule can improve the performance. But, re-scheduling is usually costly, especially in Directed Acyclic Graphs (DAGs) where there are extra data dependencies among tasks. Current research on DAG rescheduling leaves a wide open area on optimization for the scheduling algorithms.

In many cases, the data must be transported to the place where tasks will be executed. Consequently, scheduling algorithms should consider not only the task execution time, but also the data transfer time for finding a more realistic mapping of tasks. Only a handful of current research efforts consider the simultaneous optimization of computation and data transfer scheduling.

Complete Chapter List

Search this Book: