60 likes | 76 Views
Explore techniques for enhancing fault-tolerance in computing devices to boost reliability, performance, and power efficiency. Investigate redundancy-driven synthesis to determine optimal replication strategies for reliable systems.
E N D
Yield and Redundancy Marc Riedel, Caltech Iris Bahar, Brown U. Etienne Jacobs, Magma Diana Marculescu, CMU Phillip Stanley-Marbell, CMU Eric Rotenberg, NCSU
Problem • The goal: Achieving reliable computing systems from devices with high defect rates • Reliability-Aware Synthesis • Given a technique for improving fault-tolerance, how do we judge the efficacy of it in terms of a combination of performance, reliability, power consumption, etc… • Redundancy driven synthesis – what to replicate? • Observable nodes • Devices with high fanout • Instead of redundancy removal Redundancy addition for increased reliability • Fault Model • Where to handle it? Level of abstraction • What to handle? Types of faults
O1 m O2 m O3 Related Research • Logic Level • Von Neumann’56 • Assumptions • Pippenger’94 • Purely theoretical – not automated!!! • RT level • Still open ??? • Architectural level • Slipstream processors (NCSU) • Diva (UMich) • System level • CMP-based mainframes do use redundancy for increased fault-tolerance!
What is most susceptible to failures ? • Failures at inputs versus outputs • Inputs: potentially propagates throughout the circuit • … But may be masked by other signals • At primary output: must be masked for correct I/O behavior! • Need a measure of: • How susceptible a gate is to fail… • …Or which devices, when failed, will be most critical to the correct functioning of the system • Here synthesis can play a major role!
Possible Approaches ? • What works?…A lot of redundancy! • Biologically inspired approaches • Can models such as how the brain works, and work on neural nets be used in contrast to traditional logic • Models of computation and relation to the ability synthesize fault-free systems • Do we need to have/emulate another type of logic (e.g.,threshold logic?) • For analysis: borrow / extend ideas from Information Theory
Open Questions • No guarantee of complete reliability, but rather a specifiable probability of correct functioning • Reduce cost of testing by testing only what really matters • Check only the checker! • What parts of the circuit should be made redundant • Identify what’s important and what’s likely to fail • How does the addition of synthesis methods for fault-tolerance increase the complexity of verification? • E.g., speculate and then check using redundant logic. Who’s going to verify that? (Or do we need to…???)