1 / 22

Self-Checking Fault Detection using Discrepancy Mirrors

Self-Checking Fault Detection using Discrepancy Mirrors. PDPTA 2005 Las Vegas. Ronald F. DeMara, Carthik A. Sharma University of Central Florida. Fault Handling Overview. Failure Manifestation of a fault Deviation from expected behavior Detection Identify occurrence of fault

Download Presentation

Self-Checking Fault Detection using Discrepancy Mirrors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-Checking Fault Detection usingDiscrepancy Mirrors PDPTA 2005 Las Vegas Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida

  2. Fault Handling Overview • Failure • Manifestation of a fault • Deviation from expected behavior • Detection • Identify occurrence of fault • Fully articulating inputs • Intermittently articulating inputs • Methods • Coding based schemes • Redundancy • Isolation • Physical location of fault PCI-based card used for Xilinx Virtex II-Pro Based Autonomous Repair Testbed

  3. Ideal Detection Characteristics • Faults in the detector are covered by itself • Fault-secure • Self-testing • No “Golden Elements” • Multiple types of faults handled by same detector • Transient and Permanent faults • Logic and Interconnect faults • Minimum number of false-positives • Accuracy and reliability • Minimal power consumption • Verifiable correctness • Practical Assessment • Fitness assessment should be tractable

  4. Discrepancy Mirror • Mechanism for Checking-the-Checker (“golden element” problem) • Makes checker part of configuration that competes for correctness [DeMara PDPTA-05] Fault Coverage

  5. Discrepancy Mirror Circuit Fault Coverage

  6. Discrepancy Mirror Truth Table • Discrepancy Mirror Truth Table ensures complete coverage of detector. • Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)

  7. Discrepancy-Enabled Isolation

  8. Discrepancy Mirror Approach • Selection Phase • Two candidates chosen from population • Use mutually exclusive resources • Carry out computation in tandem • Detection Phase • Discrepancy Mirror compares outputs • MATCH output signifies fault free configurations • Faults in the detector also covered • Preference Adjustment Process • Detector output over time indicates relative fitness • Relative fitness can be used to choose candidates

  9.  = RS:  = (Hamming Distance) CRR Arrangement in SRAM FPGA • Configurations in Population • C = CL CR • CL = subset of left-half configurations • CR = subset of right-half configurations • |CL|=|CR |= |C|/2 • Discrepancy Operator • Baseline Discrepancy Operator is dyadic operator with binary output: • Z(Ci) is FPGA data throughput output of configuration Ci • Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual • Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair WTA: (Equivalence)

  10. Overview of FPGA operation • Competing Configurations • Configurations A and B are physically distinct • CA = subset consisting of ‘A’ configurations • CB = subset consisting of ‘B’ configurations • |CA|=|CB |= |C|/2 • Discrepancy Operator • Baseline Discrepancy Operator is dyadic operator with binary output: • Z(Ci) is FPGA data throughput output of configuration Ci • Each half-configuration evaluates  using embedded checker (XNOR gate) within each individual • Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation SRAM-based FPGA INPUT DATA CONFIGURATION BIT STREAM Configuration B Configuration A Function Logic A Function Logic B ( NOTE: a non-volatile memory is already required to boot any SRAM FPGA from cold start ... this is not an additional chip ) OFF-CHIP EEPROM ` Discrepancy Mirror A Discrepancy Mirror B DATA OUTPUT FEEDBACK CONTROL Reconfiguration Algorithm

  11. Discrepancy Mirror Schematic:CMOS • Pspice Schematic • 44 p- and n-channel • MOS Transistors • 1.5 micron minimum width • 600 nm length • Width of p-mos transistors • = 3*width of n-mos trans.

  12. Discrepancy Mirror Schematic:Xilinx • Xilinx Schematic • Virtex-II Pro FPGA • ModelSim-II Simulator • Emulated (digital) • Pull-down Resistor

  13. Discrepancy Mirror Simulation:CMOS Circuit • Transient Response • Behavior conforms to • specifications • Correct identification of • Discrepancy

  14. Discrepancy Mirror Simulation:Xilinx ModelSim-II • Circuit Response • Output ‘High’ == 1 when input q1 == q2 • Output ‘Low’ when input q1 != q2. • In Xilinx FPGAs, ‘Low’ is not exactly equal to zero, but is a Logic ‘zero’ nevertheless.

  15. Fault Location Experiments • Two experiments conducted • C-language program simulator • Locate fault by successive intersections • v-subsets or groups of resources • Fault identified after m comparisons – what is the value of m? • Identify number of iterations required to identify single-fault • Random inputs, Single stuck-at fault • Expected number of pairings over 100 simulations • One ‘resource’ equivalent to one CLB ( > 10 gates) • Experiment 1 • Perpetually articulating inputs • Experiment 2 • Intermittently articulating inputs

  16. Fault Location Using Dueling Let U denote the set of all logic resources on the FPGA S denote the pool of resources suspected of being faulty Initially denotes the set of resources used by ithconfiguration. To isolate the fault, m successive intersections, are performed at the end of which |S| = 1 With pre-designed partitions to achieve maximal isolation • Isolation can be completed in 2n iterations, where n = | |

  17. Analysis with Perpetually Articulating Inputs • Perpetually Articulating Inputs • No observed discrepancy • implies fault-free resources • Best Case (50% Utilized Capacity): • 11.1 pairings for 1,000 resources • 17.6 pairings for 100,000 resources • Most Demanding Case: • 63.7 pairings for 100,000 resources with 5% capacity utilization.

  18. Analysis with Intermittently Articulating Inputs • Intermittently Articulating Inputs • Inputs may be such that fault is not articulated at the outputs • No observed discrepancy does not • imply fault-free resources • Only discrepant outputs provide fault-location information • Best Case (45% Utilized Capacity): • 42 pairings for 1,000 resources • 64.1 pairings for 100,000 resources • Most Demanding Case: • 478 pairings for 100,000 resources with 95% capacity utilization. 50% of the inputs articulate the fault

  19. Experimental Results Summary • Number of iterations to detect faults depends on Utilized Capacity • Designs that utilize only a very few resources ( < 20%), or almost all ( > 80%) the resources on the FPGA pose difficult isolation problems • Each intersection exonerates (implicates) fewer individual resources • Method scales well • 11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and 100,000 resources. Sub-linear increase in location time. • Current Work • Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined • Investigation of Competitive Group Testing methods to enable faster fault isolation • Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.

  20. Backup Slides Follow

  21. Accommodating Multi-bit Word Widths • Proof of concept • The present circuit works efficiently • Demonstrates important Dueling-enabled isolation method • Strategies • Use an array of detectors • attempt to minimize points of failure as word-width increases • Number of logic resources used is acceptable for smaller circuits • Create new circuit or scheme, combining fault tolerant coding-based methods with single-fault secure circuit • Current research focused on improving detector by investigating codes, and fault-secure circuits

  22. Pull-down Resistor Considerations • Proof of concept • The present circuit works in a verifiable correct manner • Can utilize synthesized (digital) pull-down resistor which simulate the behavior of analog resistors • Demonstrates Dueling-enabled isolation method • Can be utilized without implementation problems for Custom-VLSI designs • Alternative Approach • Alternate detector circuits for FPGA implementation are under investigation • Avoid using Tri-state buffers, pull-down resistors and use native digital components available on FPGAs

More Related