Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

A Replica Based Co-Scheduler (RBS) for Fault Tolerant Computational Grid

Zahid Raza, Deo P. Vidyarthi

Source Title: Cloud, Grid and High Performance Computing: Emerging Applications

DOI: 10.4018/978-1-60960-603-9.ch007

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Grid is a parallel and distributed computing network system comprising of heterogeneous computing resources spread over multiple administrative domains that offers high throughput computing. Since the Grid operates at a large scale, there is always a possibility of failure ranging from hardware to software. The penalty paid of these failures may be on a very large scale. System needs to be tolerant to various possible failures which, in spite of many precautions, are bound to happen. Replication is a strategy often used to introduce fault tolerance in the system to ensure successful execution of the job, even when some of the computational resources fail. Though replication incurs a heavy cost, a selective degree of replication can offer a good compromise between the performance and the cost. This chapter proposes a co-scheduler that can be integrated with main scheduler for the execution of the jobs submitted to computational Grid. The main scheduler may have any performance optimization criteria; the integration of co-scheduler will be an added advantage towards fault tolerance. The chapter evaluates the performance of the co-scheduler with the main scheduler designed to minimize the turnaround time of a modular job by introducing module replication to counter the effects of node failures in a Grid. Simulation study reveals that the model works well under various conditions resulting in a graceful degradation of the scheduler’s performance with improving the overall reliability offered to the job.

Chapter Preview

Top

Introduction

Computational resources being scarce requires an efficient use of these resources. Resources may vary from specialized computational machines, storage machines to heterogeneous applications. Grid is the aggregation of the resources across the world seamlessly and enabling their use as, when and wherever desired rather than individual group investing heavily for high performance computational resources. In the era of high performance and high throughput computing, grid has emerged as an efficient means of connecting distributed computers or resources scattered all over the world for the purpose of collaborative computing thus essentially unifying various heterogeneous resources on a common platform while diminishing the administrative boundaries to provide a transparent access to a user. Essentially being a part of the grid means an infinite capability to execute and compute any kind of job anywhere by simply becoming its part. Therefore, even if the appropriate computational capabilities are not available with the user, the grid helps the job to be executed on the right resources thereby being efficient as well as cost effective.

Depending on the use grids can be classified as Computational grid, Data grid, Sensor grid, Biological grid etc. A computational grid emphasizes on the computing aspect thus scheduling the job to the grid resources by exploring the computational requirements of the job and effectively load balancing it. Scheduling can be based on various objectives like maximizing the reliability of job execution, minimizing the make span or maximizing the Quality of Service (QoS) for the job execution (Grid Computing Info centre, 2008; Baker, Buyya, & Laforenza, 2002; Tarricone & Esposito, 2005; Ernemann, Hamscher, & Yahyapour, 2002; Casanova, 2002; Vidyarthi, Sarker, Tripathi & Yang, 2009; Raza & Vidyarthi, 2008, 2009).

Execution of a job on the complex and dynamic grid poses number of challenges. One of these challenges is to ensure a reliable environment to the job so that it can cope with any kind of failure. Since the grid resources are heterogeneous in behavior and administrative control, introduction of fault tolerance in the system is very difficult. In addition, the jobs demanding execution on the grid themselves may be very complex and may take a long time to execute making them vulnerable to failures. Further, the resources are under the user control so even accidental damages or even a forced shutdown may fail the execution. Similar is true for the network failure also. These failures may range from hardware to software and to the network failures. The fault tolerant techniques can thus vary from proactive to reactive approaches to counter failure at any level (Dai, Xie, & Poh, 2002; Huda, Schmidt & Peake, 2005; Mujumdar, Bheevgade, Malik & Patrikar, 2008). In spite of these measures, the chances of failures cannot be overruled. The desired objective is to accept these failures and minimize their effect by gracefully degrading the system with continued job execution at the cost of a compromised overall performance. One of the popular mechanisms to handle failures is to introduce replication. This could be in the hardware form or the software form in which same application is executed or stored at more than one resources. Therefore, with the slight increase in the execution cost, replication increases the probability of the successful execution of the job, thus being fault tolerant.

Replication incurs a heavy cost but this cost can be minimized by adopting selective replication. The selection of nodes or job modules depends on certain parameters that can be decided by the system as per the scheduling requirements. The RBS works on the basis of replicating some of the modules allocated on a node with high failure rate on to those nodes with lesser failure rate. Therefore, it increases the fault tolerance of the system without severely affecting the performance.

This paper has six sections. Next section discusses the related work reported in the literature with the similar objective followed by a section elaborating the need and integration of RBS with a main scheduler. Working of the model using a suitable example is illustrated next along with the details of the results obtained from the simulation study. The chapter finally concludes detailing the achievements and drawbacks of the work.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

A Replica Based Co-Scheduler (RBS) for Fault Tolerant Computational Grid

Abstract

Introduction

Complete Chapter List