Device-Level Majority von Neumann Multiplexing

Device-Level Majority von Neumann Multiplexing

Valeriu Beiu, Walid Ibrahim, Sanja Lazarova-Molnar
Copyright: © 2009 |Pages: 9
DOI: 10.4018/978-1-59904-849-9.ch072
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter starts from an exact gate-level reliability analysis of von Neumann multiplexing using majority gates of increasing fan-ins (? = 3, 5, 7, 9, 11) at the smallest redundancy factors (RF = 2?), and details an accurate device-level analysis. The analysis complements well-known theoretical and simulation results. The gate-level analysis is exact as obtained using exhaustive counting. The extension (of the exact gate-level analysis) to device-level errors will allow us to analyze von Neumann majority multiplexing with respect to device malfunctions. These results explain abnormal behaviors of von Neumann multiplexing reported based on Monte Carlo simulations. These analyses show that device-level reliability results are quite different from the gate-level ones, and could have profound implications for future (nano)circuit designs. SIA (2005) predicts that the semiconductor industry will continue its success in scaling CMOS for a few more generations. This scaling should become very difficult when approaching 16 nm. Scaling might continue further, but alternative nanodevices might be integrated with CMOS on the same platform. Besides the higher sensitivities of future ultra-small devices, the simultaneous increase of their numbers will create the ripe conditions for an inflection point in the way we deal with reliability. With geometries shrinking the available reliability margins of the future nano(devices) are considerably being reduced (Constantinescu, 2003), (Beiu et al., 2004). From the chip designers’ perspective, reliability currently manifests itself as time-dependent uncertainties and variations of electrical parameters. In the nano-era, these device-level parametric uncertainties are becoming too high to handle with prevailing worstcase design techniques—without incurring significant penalty in terms of area, delay, and power/energy. The global picture is that reliability looks like one of the greatest threats to the design of future ICs. For emerging nanodevices and their associated interconnects the anticipated probabilities of failures, could make future nano-ICs prohibitively unreliable. The present design approach based on the conventional zero-defect foundation is seriously being challenged. Therefore, fault- and defect-tolerance techniques will have to be considered from the early design phases. Reliability for beyond CMOS technologies (Hutchby et al., 2002) (Waser, 2005) is expected to get even worse, as device failure rates are predicted to be as high as 10% for single electron technology, or SET (Likharev, 1999), going up to 30% for self-assembled DNA (Feldkamp & Niemeyer, 2006) (Lin et al., 2006). Additionally, a comprehensive analysis of carbon nano tubes for future interconnects (Massoud & Nieuwoudt, 2006) estimated the variations in delay at about 60% from the nominal value. Recently, defect rates of 60% were reported for a 160 Kbit molecular electronic memory (Green et al., 2007). Achieving 100% correctness with 1012 nanodevices will be not only outrageously expensive, but plainly impossible! Relaxing the requirement of 100% correctness should reduce manufacturing, verification, and test costs, while leading to more transient and permanent errors. It follows that most (if not all) of these errors will have to be compensated by architectural techniques (Nikolic et al., 2001) (Constantinescu, 2003) (Beiu et al., 2004) (Beiu & Rückert, 2009). From the system design perspective errors fall into: permanent (defects), intermittent, and transient faults). The origins of these errors can be found in the manufacturing process, the physical changes appearing during operation, as well as sensitivity to internal and external noises and variations. It is not clear if emerging nanotechnologies will not require new fault models, or if multiple errors might have to be dealt with. Kuo (2006) even mentioned that: “we are unsure as to whether much of the knowledge that is based on past technologies is still valid for reliability analysis.” The well-known approach for fighting against errors is to incorporate redundancy: either static (in space, time, or information) or dynamic (requiring fault detection, location, containment, and recovery). Space (hardware) redundancy relies on voters (generic, inexact, midvalue, median, weighted average, analog, hybrid, etc.) and includes: modular redundancy, cascaded modular redundancy, and multiplexing like von Neumann multiplexing vN-MUX (von Neumann, 1952), enhanced vN-MUX (Roy & Beiu, 2004), and parallel restitution (Sadek et al., 2004). Time redundancy is trading space for time, while information redundancy is based on error detection and error correction codes. This chapter explores the performance of vN-MUX when using majority gates of fan-in ? (MAJ-?). The aim is to get a clear understanding of the trade-offs between the reliability enhancements obtained when using MAJ-? vN-MUX at the smallest redundancy factors RF = 2? (see Fig. 1) on one side, versus both the fan-ins and the unreliable nanodevices on the other side. We shall start by reviewing some theoretical and simulation results for vN-MUX in Background section. Exact gate-level simulations (as based on an exhaustive counting algorithm) and accurate device-level estimates, including details of the effects played by nanodevices on MAJ-? vN-MUX, are introduced in the Main Focus of the Chapter section. Finally, implications and future trends are discussed in Future Trends, and conclusions and further directions of research are ending this chapter.
Chapter Preview
Top

Introduction

This chapter starts from an exact gate-level reliability analysis of von Neumann multiplexing using majority gates of increasing fan-ins (∆ = 3, 5, 7, 9, 11) at the smallest redundancy factors (RF = 2∆), and details an accurate device-level analysis. The analysis complements well-known theoretical and simulation results. The gate-level analysis is exact as obtained using exhaustive counting. The extension (of the exact gate-level analysis) to device-level errors will allow us to analyze von Neumann majority multiplexing with respect to device malfunctions. These results explain abnormal behaviors of von Neumann multiplexing reported based on Monte Carlo simulations. These analyses show that device-level reliability results are quite different from the gate-level ones, and could have profound implications for future (nano)circuit designs.

SIA (2005) predicts that the semiconductor industry will continue its success in scaling CMOS for a few more generations. This scaling should become very difficult when approaching 16 nm. Scaling might continue further, but alternative nanodevices might be integrated with CMOS on the same platform. Besides the higher sensitivities of future ultra-small devices, the simultaneous increase of their numbers will create the ripe conditions for an inflection point in the way we deal with reliability.

With geometries shrinking the available reliability margins of the future nano(devices) are considerably being reduced (Constantinescu, 2003), (Beiu et al., 2004). From the chip designers’ perspective, reliability currently manifests itself as time-dependent uncertainties and variations of electrical parameters. In the nano-era, these device-level parametric uncertainties are becoming too high to handle with prevailing worst-case design techniques—without incurring significant penalty in terms of area, delay, and power/energy. The global picture is that reliability looks like one of the greatest threats to the design of future ICs. For emerging nanodevices and their associated interconnects the anticipated probabilities of failures, could make future nano-ICs prohibitively unreliable. The present design approach based on the conventional zero-defect foundation is seriously being challenged. Therefore, fault- and defect-tolerance techniques will have to be considered from the early design phases.

Reliability for beyond CMOS technologies (Hutchby et al., 2002) (Waser, 2005) is expected to get even worse, as device failure rates are predicted to be as high as 10% for single electron technology, or SET (Likharev, 1999), going up to 30% for self-assembled DNA (Feldkamp & Niemeyer, 2006) (Lin et al., 2006). Additionally, a comprehensive analysis of carbon nano tubes for future interconnects (Massoud & Nieuwoudt, 2006) estimated the variations in delay at about 60% from the nominal value. Recently, defect rates of 60% were reported for a 160 Kbit molecular electronic memory (Green et al., 2007). Achieving 100% correctness with 1012 nanodevices will be not only outrageously expensive, but plainly impossible! Relaxing the requirement of 100% correctness should reduce manufacturing, verification, and test costs, while leading to more transient and permanent errors. It follows that most (if not all) of these errors will have to be compensated by architectural techniques (Nikolić et al., 2001) (Constantinescu, 2003) (Beiu et al., 2004) (Beiu & Rückert, 2009).

Key Terms in this Chapter

Error Threshold: The probability of failure of a component (gate, device) above which the multiplexed scheme is not able to improve over the component itself.

Fan-In: Number of inputs (to a gate).

Circuit: Network of devices.

Gate (Logic): Functional building block (in digital logic a gate performs a logical operation on its logic inputs).

Monte Carlo: A class of stochastic (by using pseudorandom numbers) computational algorithms for simulating the behaviour of physical and mathematical systems.

Counting (Exhaustive): The mathematical action of repeated addition (exhaustive considers all possible combinations).

Majority (Gate): A logic gate of odd fan-in which outputs a logic value equal to that of the majority of its inputs.

Multiplexing (von Neumann): A scheme for reliable computations based on successive computing stages alternating with random interconnection stages (introduced by von Neumann in 1952).

Fault-Tolerant: The ability of a system (circuit) to continue to operate rather than failing completely (possibly at a reduced performance level) in the event of the failure of some of its components.

Redundancy (Factor): Multiplicative increase in the number of (identical) components (subsystems, blocks, gates, devices), which can (automatically) replace (or augment) failing component(s).

Device: Any physical entity deliberately affecting the information carrying particle (or their associated fields) in a desired manner, consistent with the intended function of the circuit.

Reliability: The ability of a circuit (system, gate, device) to perform and maintain its function(s) under given (as well as hostile or unexpected) conditions, for a certain period of time.

Complete Chapter List

Search this Book:
Reset