Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Optimizing Fault Tolerance for Multi-Processor System-on-Chip

Dimitar Nikolov, Mikael Väyrynen, Urban Ingelsson, Virendra Singh, Erik Larsson

Source Title: Design and Test Technology for Dependable Systems-on-Chip

DOI: 10.4018/978-1-60960-212-3.ch003

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

While the rapid development in semiconductor technologies makes it possible to manufacture integrated circuits (ICs) with multiple processors, so called Multi-Processor System-on-Chip (MPSoC), ICs manufactured in recent semiconductor technologies are becoming increasingly susceptible to transient faults, which enforces fault tolerance. Work on fault tolerance has mainly focused on safety-critical applications; however, the development of semiconductor technologies makes fault tolerance also needed for general-purpose systems. Different from safety-critical systems where meeting hard deadlines is the main requirement, it is for general-purpose systems more important to minimize the average execution time (AET). The contribution of this chapter is two-fold. First, the authors present a mathematical framework for the analysis of AET. Their analysis of AET is performed for voting, rollback recovery with checkpointing (RRC), and the combination of RRC and voting (CRV) where for a given job and soft (transient) error probability, the authors define mathematical formulas for each of the fault-tolerant techniques with the objective to minimize AET while taking bus communication overhead into account. And, for a given number of processors and jobs, the authors define integer linear programming models that minimize AET including communication overhead. Second, as error probability is not known at design time and it can change during operation, they present two techniques, periodic probability estimation (PPE) and aperiodic probability estimation (APE), to estimate the error probability and adjust the fault tolerant scheme while the IC is in operation.

Chapter Preview

Top

1. Introduction

The rapid development in semiconductor technologies has enabled fabrication of integrated circuits (ICs) that can include multiple processors, referred to as multi-processor system-on-chips (MPSoCs). The drawback of the semiconductor development is that ICs are becoming increasingly sensitive to soft (temporary) errors that manifest themselves when the IC is in operation (Kopetz, Obermaisser, Peti, & Suri, 2004), (Sosnowski, 1994). The soft error rate has increased by orders of magnitude compared with earlier technologies, and the rate is expected to grow in future semiconductor technologies (Borel, 2009). It is becoming increasingly important to consider techniques that enable error detection and recover from soft errors (Borel, 2009), (Borkar, 1999), (Mukherjee, 2008). In this chapter we focus on fault-tolerant techniques addressing soft errors (Borel, 2009) (Chandra & Aitken, 2008).

Fault tolerance has been subject of research for a long time. John von Neumann introduced already in 1952 a redundancy technique called NAND multiplexing for constructing reliable computation from unreliable devices (von Neuman, 1956). Significant amount of work has been produced over the years. For example, researchers have shown that schedulability of an application can be guaranteed for pre-emptive on-line scheduling under the precence of a single transient fault (Bertossi & Mancini, 1994), (Burns, Davis, & Punnekkat, 1996), (Han, Shin, & Wu, 2003), (Zhang & Chakrabarty, 2006). Punnekat et al. assume that a fault can adversely affect only one job at a time (Punnekkat, Burns, & Davis, 2001). Kandasamy et al. consider a fault model which assumes that only one single transient fault may occur on any of the nodes during execution of an application (Kandasamy, Hayes, & Murray, 2003). This model has been generalized in the work of Pop et al. to a number k of transient faults (Pop, Izosimov, Eles, & Peng, 2005). Most work in the area of fault tolerance has focused on safety-critical systems and the optimization of such systems (Al-Omari, Somani, & Manimaran, 2001), (Bertossi, Fusiello, & Mancini, 1997), (Pop, Izosimov, Eles, & Peng, 2005). For example the architecture of the fighter JAS 39 Gripen contains seven hardware replicas (Alstrom & Torin, 2001). For a general-purpose system (non safety-critical system), for example a mobile phone, redundancy such as the one used in JAS 39 Gripen, seven hardware replicas, is too costly. For general-purpose systems, the average execution time (AET) is more important than meeting hard deadlines. For example, a mobile phone user can usually accept a slight and temporary performance degradation, so that error-free operation is ensured.

There are two major drawbacks with existing work. First, there is for general purpose systems no framework that can analyze and guide to what extent to make use of fault tolerance while taking cost (performance degradation and bus communication) into account. Second, approaches depend on a known error probability; however, error probability is not known at design time, it is different for different ICs, and it is not constant through the lifetime of an IC due to for example aging and the environment where the IC is to be used (Cannon, KleinOsowski, Kanj, Reinhardt, & Joshi, 2008), (Karnik, Hazucha, & Patel, 2004), (Koren & Krishna, 1979), (Lakshminarayanan, 1999).

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Optimizing Fault Tolerance for Multi-Processor System-on-Chip

Abstract

1. Introduction

Complete Chapter List