1 / 49

Dealing with Multiple Simultaneous Faults in Future Technologies

Dealing with Multiple Simultaneous Faults in Future Technologies. Carlos A. L. Lisbôa Erik Schüler Luigi Carro. Why Multiple Simultaneous Faults ?. Future technologies (2010 and beyond) very small transistors and fewer electrons to form the channel (  SETs)

Download Presentation

Dealing with Multiple Simultaneous Faults in Future Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dealing withMultiple Simultaneous Faultsin Future Technologies Carlos A. L. Lisbôa Erik Schüler Luigi Carro

  2. Why Multiple Simultaneous Faults ? • Future technologies (2010 and beyond) • very small transistors and fewer electrons to form the channel ( SETs) • transient pulses due to radiation attack will last longer than the propagation delays of gates • devices will be more sensitive to the effects of electromagnetic noise, neutrons and alpha particles

  3. Single Event Upset Origin 1 0 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0

  4. Why Should One Study Multiple Faults ? Change in paradigm: Gates will behave statistically, producing correct outputs only a fraction of the time.

  5. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection)

  6. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection) • How to deal with this problem ? • new materials and manufacturing technologies must be developed OR • new design approaches must be taken

  7. How to Deal with Multiple Faults ? • New paradigm: multiple simultaneous faults • new fault tolerance techniques will be required (TMR will no longer provide enough protection) • How to deal with this problem ? • new design approaches must be taken (our bet !)

  8. Research Approaches • Use of stochastic operators • Use of bit stream operators • Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults • Next steps: 2005 - 2007 time frame

  9. Research Evolution Bit Stream Operators Small footprint and fast Tolerant to multiple faults in n-MR solutions Looking for more speed Looking for tolerant converter Stochastic Operators Analog Voter OK for some DSP Applications

  10. Using Stochastic Operators • SEU induced transient errors are of random nature

  11. Using Stochastic Operators • SEU induced transient errors are of random nature • Stochastic operators rely on randomness to produce approximate results

  12. % Errors in 1,000 additions Stochastic Adder Conventional 2 faults 0 faults 4 faults 8 faults 0.0000 0.1412 0.2580 0.1768 0.2196 Using Stochastic Operators • SEU induced transient errors are of random nature • Stochastic operators rely on randomness to produce approximate results • The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results

  13. Using Stochastic Operators • SEU induced transient errors are of random nature • Stochastic operators rely on randomness to produce approximate results • The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results • Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)

  14. 01100010101 S1 010111011001 Sum S3 0010100110101 S2 01010101101 Stochastic Adder Circuit 1001000100001011 1000000100001010 1000100110011010 Stochastic multiplier circuit Using Stochastic Operators • Benefit: reduced area of the operators

  15. Using Stochastic Operators How does it work ? Come and see the posters ! No free drinks, but the answer to this question is granted !

  16. Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier

  17. F1 F1 F1 2 1 0 x F2 F2 F2 2 1 0 . . . F2 F1 F2 F1 F2 F1 0 2 0 1 0 0 . . . F2 F1 F2 F1 F2 F1 1 2 1 1 1 0 . . . F2 F1 F2 F1 F2 F1 2 2 2 1 2 0 b48 .. b33 b32 .. b17 b16 .. b5 b4 .. b1 b0 Proposed Multiplication Algorithm - bit stream product (the count of 1’s in the stream is equal to the product value) Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation

  18. b48 .. b48 b47 .. b47 ... b0 .. b0 1 1 1 1 0 0 0 8 times 8 times 8 times +4 total count of 1’s = 8 * product + 4 Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips Adding robustness to the bit stream through redundancy

  19. Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips • Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults

  20. Using Bit Stream Operators • Computation principles similar to those of the stochastic adder and multiplier • Operators can produce bit streams which represent the exact results of the operation • Redundancy is added to the bit streams in order to stand to multiple bit flips • Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults • Issues to be further investigated: size of bit streams and area of the conversion circuits

  21. Using Bit Stream Operators How does it work ? Come and see the posters ! No free food, but some more info on this subject will be provided !

  22. What is Wrong with TMR ? • TMR protects only against single faults in one of the modules V O T E R Module 1 correct output Module 2 correct output correct output Module 3 correct output

  23. Module 2 wrong output What is Wrong with TMR ? • TMR protects only against single faults in one of the modules V O T E R Module 1 correct output correct output Module 3 correct output

  24. Module 2 correct output What is Wrong with TMR ? • TMR does not protect against double faults in different modules V O T E R Module 1 wrong output wrong output Module 3 wrong output

  25. What is Wrong with TMR ? • When a single fault occurs in the voter circuit, the voter output may be wrong V O T E R Module 1 correct output Module 2 correct output correct output Module 3 correct output

  26. What is Wrong with TMR ? • When a single fault occurs in the voter circuit, the voter output may be wrong V O T E R Module 1 correct output Module 2 correct output ? correct output Module 3 correct output

  27. Making TMR (n-MR) more reliable • Known solutions imply in • area, performance and / or power penalties • deadlock: how to protect the output generator ?

  28. Making TMR (n-MR) more reliable • Known solutions imply in • area, performance and / or power penalties • deadlock: how to protect the output generator ? • Proposed solution: • use TMR to cope with single faults in the modules

  29. Making TMR (n-MR) more reliable • Known solutions imply in • area, performance and / or power penalties • deadlock: how to protect the output generator ? • Proposed solution: • use TMR to cope with single faults in the modules • replace the digital voter by an analog voter that • uses a comparator to generate the output

  30. Making TMR (n-MR) more reliable • Known solutions imply in • area, performance and / or power penalties • deadlock: how to protect the output generator ? • Proposed solution: • use TMR to cope with single faults in the modules • replace the digital voter by an analog voter that • uses a comparator to generate the output • can support some noise, nevertheless producing the correct result

  31. The Analog Voter

  32. Minimum Area Comparator Injection of faults in the comparator (*) (*) using CMOS 0.35µm

  33. Electrical Simulation: Multiple Faults(SPICE and CMOS 0.35 m)

  34. Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR)

  35. Dealing with Multiple Simultaneous Faults: n-MR The Analog Voter with 5 Inputs (for 5-MR) Simulations with injection of 2 simultaneous faults also succeeded

  36. Does this work ??? The Analog Voter ... Oops !

  37. The Analog Voter Let’s see the posters !

  38. Future Work - Short Term (2005-2006) • use of signal redundancy with other number representation forms, such as Sigma-Delta

  39. Future Work - Short Term (2005-2006) • use of signal redundancy with other number representation forms, such as Sigma-Delta • use of the analog voter as an efficient way to implement robust n-MR circuits

  40. Future Work - Short Term (2005-2006) • use of signal redundancy with other number representation forms, such as Sigma-Delta • use of the analog voter as an efficient way to implement robust n-MR circuits • investigate the application of statistical methods and neural networks to the design of fault tolerant circuits with minimum redundancy

  41. Future Work - Long Term (2006-2007) • use of logic properties to develop signal redundancy with low cost

  42. Future Work - Long Term (2006-2007) • use of logic properties to develop signal redundancy with low cost • apply the developed techniques to actual processors w/ DSP and VLIW architectures

  43. Future Work - Long Term (2006-2007) • use of logic properties to develop signal redundancy with low cost • apply the developed techniques to actual processors with DSP and VLIW architectures • discuss the architectural impact of new technologies together with fault tolerance

  44. Research Evolution Bit Stream Operators Stochastic Operators Analog Voter previous work (2004-2005) 2005 2006 2007

  45. Research Evolution Sigma Delta Bit Stream Operators Stochastic Operators Analog Voter previous work (2004-2005) 2005 2006 2007

  46. Research Evolution Sigma Delta Bit Stream Operators Logic Properties Stochastic Operators Analog Voter previous work (2004-2005) 2005 2006 2007

  47. Research Evolution Sigma Delta Bit Stream Operators Logic Properties Low cost redundancy Stochastic Operators Analog Voter previous work (2004-2005) 2005 2006 2007

  48. Research Evolution Sigma Delta Bit Stream Operators Logic Properties Low cost redundancy Stochastic Operators Analog Voter Application to actual DSP and VLIW processors DSP / VLIW previous work (2004-2005) 2005 2006 2007

  49. Thank You ! Questions ? Looking forward to answer them at the poster booth! (# 20.4) Contact: calisboa@inf.ufrgs.br No free anything, but a nice chat about these matters will be a pleasure !

More Related