Investigating Deadline-Driven Scheduling Policy via Simulation with East

Investigating Deadline-Driven Scheduling Policy via Simulation with East

Justin M. Wozniak (University of Notre Dame and Argonne National Laboratory, USA) and Aaron Striegel (University of Notre Dame, USA)
DOI: 10.4018/978-1-60566-370-8.ch008


Opportunistic techniques have been widely used to create economical computation infrastructures and have demonstrated an ability to deliver heterogeneous computing resources to large batch applications, however, batch turnaround performance is generally unpredictable, negatively impacting human experience with widely shared computing resources. Scheduler prioritization schemes can effectively boost the share of the system given to particular users, but to gain a relevant benefit to user experience, whole batches must complete on a predictable schedule, not just individual jobs. Additionally, batches may contain a dependency structure that must be considered when predicting or controlling the completion time of the whole workflow; the slowest or most volatile prerequisite job determines performance. In this chapter, a probabilistic policy enforcement technique is used to protect deadline guarantees against grid resource unpredictability as well as bad estimates. Methods to allocate processors to a common workflow subcase, barrier scheduling, are also presented.
Chapter Preview

1 Introduction

Running complex applications on widely distributed resources is an unpredictable process, complicating the user experience with new systems. While opportunistic technologies and grid infrastructures dramatically increase the resources available to the application, they also increase the range and volatility of resulting behaviors. Job turnaround time, the span between the time a job is ready to run and the time results are returned, is of primary importance to users seeking larger, more powerful computing platforms, but is more difficult to measure on conglomerations of heterogeneous computation elements. The fact that users cannot easily achieve predictable turnaround results – even in the presence of increased available parallelism, and when average case performance is improved – can cause frustration and reduce interest in new, complex distributed computing systems.

There is a fundamental disconnect between short-term user objectives and long-term system design techniques that underlies many user frustrations with commodity computing systems. Users prefer responsive, fast turnarounds for specific workloads. Grid system designers and administrators take a long view, intending to maximize utilization, and thus the return on investment, for a given resource set, given a wider range of applications. These viewpoints translate into the various technologies available. For example, opportunistic systems (Thain, 2004) excel at improving system utilization start by locating idle resources and employing them to perform useful work. Real-time systems (Murthy, 2001), contrarily, start by admitting acceptable workloads and ensuring that the results will be returned on schedule. Ordinary workstations are a middle ground, providing a necessarily available resource that offers moderate predictability through system simplicity and isolation from external forces.

1.1 Sources of Unpredictability

We start with an examination of the underlying causes of unpredictable performance in opportunistic systems.

  • Processor heterogeneity: Since users of opportunistic systems often employ resources owned and managed by diverse organizations, heterogeneity is an ever-present challenge. CPU heterogeneity takes two forms: “hard” heterogeneity, meaning architectural differences, and “soft” heterogeneity, meaning performance differences within a class of compatible architectures. 
 While users can benefit from existing matchmaking techniques to select certain processors, opportunistic systems have been designed for “compute-hungry” users. These users are capable of consuming an ever-growing number of processors. Thus, they are expected to overcome hard heterogeneity obstacles by compiling for multiple architectures or using portable interpreted languages. Consequently, the impact of heterogeneous systems can be modelled by simply considering performance differences.

  • Contention: Opportunistic systems attempt to harness the aggregate computing ability of large numbers of processors for large numbers of users. The requested workload for such a system can thus be quite variable, particularly in small scale, experimental settings. For example, in a typical university-sized Condor installation, a user may initially have access to all of the available machines, but then unexpectedly be forced to split the resources with another user.

Heterogeneous, opportunistic systems thus pose two significant performance predictability problems for users. First, performance analysis prediction for a given task on a range of potential architectures is a labor-intensive process that is not standard practice in distributed computing, nor is it appealing to potential new users of complex systems. Second, since micromanagement of job distribution is also a complex project that must take into account the contention for resources among multiple users, the actual execution site of a task in an opportunistic system is often left to the metascheduler. Consequently, while the benefits of tight runtime estimates are clear, modern systems must recognize that typical cases will rely on rough estimates given by users, and are not particularly trustworthy due to the unpredictable allocation of imperfectly understood processors.

Complete Chapter List

Search this Book: