Two Approaches for Workflow Scheduling with Quality of Service in the Grid

Two Approaches for Workflow Scheduling with Quality of Service in the Grid

Fangpeng Dong (Queen’s University, Canada) and Selim G. Akl (Queen’s University, Canada)
DOI: 10.4018/978-1-4666-0879-5.ch601
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Over the past decade, Grid Computing has earned its reputation by facilitating resource sharing in larger communities and providing non-trivial services. However, for Grid users, Grid resources are not usually dedicated, which results in fluctuations of available performance. This situation raises concerns about the quality of services (QoS). The meaning of QoS varies with different concerns of different users. Objective functions that drive job schedulers in the Grid may be different from each other as well. Some are system-oriented, which means they make schedules to favor system metrics such as throughput, load-balance, resource revenue and so on. To narrow the scope of the problem to be discussed in this chapter and to make the discussion substantial, the scheduling objective function considered is minimizing the total completion time of all tasks in a workflow (also known as the makespan). Correspondingly, the meaning of QoS is restricted to the ability that scheduling algorithms can shorten the makespan of a workflow in an environment where resource performance is vibrant. This chapter introduces two approaches that can provide QoS features at the workflow scheduling algorithm level in the Grid. One approach is based on a workflow rescheduling technique, which can reallocate resources for tasks when a resource performance change is observed. The other copes with the stochastic performance change using pre-acquired probability mass functions (PMF) and produces a probability distribution of the final schedule length, which will then be used to handle the different QoS concerns of the users.
Chapter Preview
Top

1. Introduction

The development of Grid infrastructures, for example, Condor DAGMan (DAGMan), Grid Flow (Cao, 2003) and Pegasus (Deelman, 2004), now enables workflow to be submitted and executed on remote computational resources. To harness the non-trivial power of Grid resources, effective task scheduling approaches are necessary. Typically, there are two ways to map a set of tasks to a set of resources: dynamic scheduling and static scheduling. The dynamic approach makes a schedule at real time when a task is ready, while the static approach makes a schedule by planning in advance based on available information on resources and tasks. Because of its advantage of “planning ahead”, a static scheduler can produce a schedule that overlaps communication time and computational time. Therefore, the makespan can be reduced. It can even achieve a near optimal schedule for some complex scientific workflows involving a larger number of computational tasks and data communications (Wieczorek, 2005). However, unlike dynamic scheduling, the static approach relies heavily on accurate resource and task information to make a schedule. Unfortunately, the performance of Grid resources is hard to accurately predict. The difficulty is due to the following facts. First, Grid resources are not dedicated to one particular Grid user. Resources’ local management policies and competition from other users may make the performance acquired by a user fluctuate. Second, existing Grid resources may fail without a notice and new Grid resources may join the resource pool; both cases will change the available performance for users.

In order to benefit from the merits of static scheduling for complex workflows in the Grid, new performance prediction and modeling methods are introduced. The differences of the presentation of stochastic performance information (e.g., deterministic vs. non-deterministic) influence the way that scheduling algorithms can take to meet QoS requirements.

The rescheduling approach (Yu, 2007; Sakellarion, 2004) can modify an initial schedule according to the fluctuation of resource performance at the run-time. This approach needs to first make an initial schedule and in order to make such an initial schedule, the algorithm usually relies on performance prediction mechanisms (McGough, 2005) to acquire the estimated performance that the tasks will get when they are running on the resources. This predicted performance is usually given as a deterministically specific value, although this value might not be the exact performance that the application will archive at the run-time. For example, the approach that will be introduced in this chapter uses the PFAS algorithm (Dong, 2007) to get an initial schedule and the PFAS algorithm needs information on resource performance in different time slots. Figure 1 presents an example of the fluctuation of available performance in different time slots.

Figure 1.

Performance fluctuation resulted from advance 
reservation on a resource along time

Besides the resource performance model that is based on deterministic prediction, there is an alternative way which is nondeterministic. Instead of specifying on the value of available performance, this approach tries to describe the performance by some probability functions, which can be derived from task execution records in the past (e.g., log files) (Li, 2007). Particularly, in the algorithm introduced in this chapter, we assume that the performance of a resource is a discrete random variable and its probability mass functions (PMF) can be known prior to scheduling from Grid resource information services. For example, Figure 2 shows the PMF of the performance of a resource, which has four possible values.

Figure 2.

Performance probability mass function of a 
computational resource in the Grid

Complete Chapter List

Search this Book:
Reset