Optimizing Fault Tolerance for Multi-Processor System-on-Chip

Dimitar Nikolov, Mikael Väyrynen, Urban Ingelsson, Virendra Singh, Erik Larsson

Source Title: Design and Test Technology for Dependable Systems-on-Chip

ISBN13: 9781609602123|ISBN10: 1609602129|EISBN13: 9781609602147

DOI: 10.4018/978-1-60960-212-3.ch003

MLA

Nikolov, Dimitar, et al. "Optimizing Fault Tolerance for Multi-Processor System-on-Chip." Design and Test Technology for Dependable Systems-on-Chip, edited by Raimund Ubar, et al., IGI Global, 2011, pp. 66-91. https://doi.org/10.4018/978-1-60960-212-3.ch003

APA

Nikolov, D., Väyrynen, M., Ingelsson, U., Singh, V., & Larsson, E. (2011). Optimizing Fault Tolerance for Multi-Processor System-on-Chip. In R. Ubar, J. Raik, & H. Vierhaus (Eds.), Design and Test Technology for Dependable Systems-on-Chip (pp. 66-91). IGI Global. https://doi.org/10.4018/978-1-60960-212-3.ch003

Chicago

Nikolov, Dimitar, et al. "Optimizing Fault Tolerance for Multi-Processor System-on-Chip." In Design and Test Technology for Dependable Systems-on-Chip, edited by Raimund Ubar, Jaan Raik, and Heinrich Theodor Vierhaus, 66-91. Hershey, PA: IGI Global, 2011. https://doi.org/10.4018/978-1-60960-212-3.ch003

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

While the rapid development in semiconductor technologies makes it possible to manufacture integrated circuits (ICs) with multiple processors, so called Multi-Processor System-on-Chip (MPSoC), ICs manufactured in recent semiconductor technologies are becoming increasingly susceptible to transient faults, which enforces fault tolerance. Work on fault tolerance has mainly focused on safety-critical applications; however, the development of semiconductor technologies makes fault tolerance also needed for general-purpose systems. Different from safety-critical systems where meeting hard deadlines is the main requirement, it is for general-purpose systems more important to minimize the average execution time (AET). The contribution of this chapter is two-fold. First, the authors present a mathematical framework for the analysis of AET. Their analysis of AET is performed for voting, rollback recovery with checkpointing (RRC), and the combination of RRC and voting (CRV) where for a given job and soft (transient) error probability, the authors define mathematical formulas for each of the fault-tolerant techniques with the objective to minimize AET while taking bus communication overhead into account. And, for a given number of processors and jobs, the authors define integer linear programming models that minimize AET including communication overhead. Second, as error probability is not known at design time and it can change during operation, they present two techniques, periodic probability estimation (PPE) and aperiodic probability estimation (APE), to estimate the error probability and adjust the fault tolerant scheme while the IC is in operation.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Optimizing Fault Tolerance for Multi-Processor System-on-Chip

MLA

APA

Chicago

Export Reference

Abstract

Request Access