1 / 31

Fault Tolerance in Reconfigurable Computing / FPGAs

Fault Tolerance in Reconfigurable Computing / FPGAs. Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006. Outline. Introduction Modify Configurable Logic Block (CLB) Dynamic Serial Testing Built-In Self Healing (BISH) Hardware Voter

jimbo
Download Presentation

Fault Tolerance in Reconfigurable Computing / FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Tolerance in Reconfigurable Computing / FPGAs Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006

  2. Outline • Introduction • Modify Configurable Logic Block (CLB) • Dynamic Serial Testing • Built-In Self Healing (BISH) • Hardware Voter • Configurable Fault Tolerant Processor (CFTP) • Self-Checking Logic Design (SCLD) • CLB Functional Testing

  3. Introduction • Configurable Logic Block (CLB) • Interconnect Wires • Interconnect Switches • Configured by SRAM contents Configuration SRAM

  4. Modify CLB [4] • Consider faults only in CLB • Shift configuration data • Means load only one configuration for test • Very slow process • Shift this configuration for next tests • Do not change physical design of running application • No intervention at hardware level • Faster • Better results in test diagnosis and defect/fault tolerance

  5. Modify CLB [4] (Cont’d) • SRAM • Assume this as faulty free • Has configuration data • Modify this to enable shifting configuration • Adding a multiplexer • Decide shifting direction • Shifting to east/west/north/south

  6. Modify CLB [4] (Cont’d) • Hardware overhead • Calculate additional transistor count • Calculate device transistor count • Compare them

  7. Dynamic Serial vs Parallel [5] • Reduce test configuration time • Require less i/o pin • Faster and easier

  8. Dynamic Serial vs Parallel [5] (Cont’d) • Consider unprogrammed FPGAs to test • No a specific user designed application configuration • Consider all configurations • Generate and download configurations • Time consuming • Decompose number of configurations • Find test patterns

  9. Dynamic Serial Test [5] (Cont’d) • Function unit • Multiplexers and one D-Type Flip Flop • Test Pattern requirements for multiplexers • Detect stuck-on/off faults of them • Stuck-at faults of all their i/o nets • Bridge faults of data inputs

  10. Dynamic Serial Test [5] (Cont’d) • 11 Test configuration (TC) for function unit • Provide an efficient way to test many function units in short time • 11 TC * 4096 = 45056 TC for XC6216 • Apply parallel testing after this step

  11. Dynamic Serial Test [5] (Cont’d) • Direct Parallel Testing • Test row or column cells at the same time • TC count increases with FPGA size, 11 TC per test unit • Not so efficient • Two – Phase Parallel Testing • Reed-Muller Propagation Chain (RMPC) • 22 TC per test unit, constant • Single faulty function unit location with 4 TC

  12. Dynamic Serial Test [5] (Cont’d) • Proposed Method • Link all function units into a chain • Test chain integrity in baypass mode • Test function unit with its 11 TCs and corresponding test patterns (TP) • Return to bypass mode • Repeat for the next function unit

  13. Dynamic Serial Test [5] (Cont’d) • Compare with parallel testing • Required less TC • 13 TCs, not 22 TCs • Locate fault without additional TC • Use less i/o pin • Simplify test observation

  14. Dynamic Serial Test [5] (Cont’d) • Disadvantage • Propagation path length • Depends on array size • Integrate with parallel approach for large arrays • Additional i/o pins

  15. Built-In Self Healing (BISH) [8] • Run time self configuration • Implement a soft-processor • Manage and execute all procedures • Fault detection/location/repair • Modular redundancy for assurance of working correctly

  16. BISH - Submicron technology problems [8] • Single event upsets (SEU) • Radiation-induced transient errors caused by neutrons from cosmic rays • Alpha particles from packing material • do not physically damage the chip • Changes in memory cell values • Incorrect data • Improper instruction for processor • Increase threat of electromigration • Physical damage to chip

  17. BISH - Tasks [8] • Detection • Scan chain • Regulary capture net values • Analyze them in soft-processor • Diagnosis, Repair • Controlled also by soft-processor • Applied for only SEUs

  18. BISH - Fault Causes [8] • SEU changing a circuit register value • Possibly a transient error • Invalid in next capture after register update • SEU changing configuration memory cell • Wrong functionality assignment on FPGA • Readback configuration • CRC check • Partial reconfiguration if incohorency exits • Permanent physical defect on FPGA • Mark down this defected area

  19. Hardware Voter [6] Bypass double error by substituting errornous data with spare one Congruency level of accepted SEs Detect and correct single errors • Detect and correct single errors on inputs • Bypass double errors in X1, X2, X3 by substuting errornous data with spare one, X4 Unrecoverable error signal Spare Spare

  20. Configurable Fault Tolerant Processor (CFTP) [2] • Applied for spacecraft onboard processing • Triple Modular Redundancy (TMR) for soft processor on FPGA • Mitigate bit errors in computation by detecting and correcting them using voting logic • On orbit updates, reconfigurations, modifications • Detect SEU-induced configuration faults

  21. Self-Checking Logic Design (SCLD) [3] • Map boolean functions into FPGA • Functional cell • Generate complementary outputs • Checker cell • Verify correctness of final outputs • Fault: same value at outputs • Increase number of CLBs used but incorporate self-checking or testability features

  22. SCLD – Fault Types [3] • Single stuck-at faults in RAM cells • Single stuck-at faults on any line of a CLB • Functional faults in any multiplexer within a single CLB • Functional faults in any D-Type Flip Flop within a single CLB • Single stuck-at faults in any pass transistor connecting CLBs

  23. SCLD [3] • k-feasible • 4 inputs for functional cells • 4-feasible boolean functions required • If not, decompose boolean function before map it on FPGA

  24. SCLD – Algorithm [3] • Decompose a sum-of-products expression into 4-feasible expression. Choose the expression with the minimum number of nodes • Map each expression directly into a 4-input function cell • Connect outputs of a pair of intermediate function cells to the inputs of a checker cell, and generate the equations for each output of the checker cell • Cascade the checker cells to form a checker tree. The outputs of the function cell at the last stage are outputs circuit.

  25. SCLD – Example [3]

  26. SCLD – Implementation [3]

  27. CLB Functional Testing [1] • Gate level testing not required • Use CLB functional property • AND, OR gate or any boolean expression • Additional hardware to apply test • Multiplexer • Example for 2-inputs CLB

  28. CLB Functional Testing - Redundant Faults [1] • CLB function = AND gate • Sa0 on first data input of a multiplexer • Sa0 on second data input of a multiplexer • Sa0 on third data input of a multiplexer • Sa1 on fourth data input of a multiplexer • CLB function = OR gate • Sa0 on first data input of a multiplexer • Sa1 on second data input of a multiplexer • Sa1 on third data input of a multiplexer • Sa1 on fourth data input of a multiplexer

  29. CLB Functional Testing [1] • Exhaustive testing applied • Long test length but high fault coverage • 99.81%, compare with 87.90% of gate-level testing

  30. Conclusion • Dynamic reconfigurable environments • Use flexible test of circuits • Repair errors by partial reconfiguration • Do not disturb normal operation in defect on partial hardware • Design your processor on them to provide self-test on circuit

  31. References • [1] Testing of FPGA Logic Cells, E. Bareisa, V.Jusas, K.Motiejunas, R.Seinauskas, 2004 ISSN 1392-1215 Elektronika IR Elektrotechnica. • [2] Configurable Fault-Tolerant Processor (CFTP) for SpaceCraft Onboard Processing, Charles A. Hulme, Herschel H. Loomis, Alan A. Ross, Rong Yuan, 2004 IEEE Aerospace Conference Proceedings • [3] Self-Checking Logic Design for FPGA Implementation, Parag K. Lala, Alfred L. Burress, 2003 IEEE Transactions on Instrumentation and Measurement • [4] FPGAs and Fault Tolerance, Abderrahim Doumar, Hideo Ito, 2001 The 13th International Conference on Microelectronics • [5] Fault Detection and Location of Dynamic Reconfigurable FPGAs, Chi-Feng Wu, Cheng-Wen Wu • [6] FPGA Implementation of Hardware Voter, Milos D. Krstic, Mile K. Stojcev, TELSIKS 2001 IEEE • [7] Testing the Configurability of Dynamic FPGAs, N. Park, S. J. Ruiwale, F. Lombardi, 2000 IEEE • [8] A Self –Healing Real-Time System Based on Run-Time Self Reconfiguration, Manuel G. Gericota, Gustavo R. Alves, Jose M. Ferreira, 2005 IÊEE • [9] Testing Approach within FPGA-based Fault Tolerant Systems, Abderrahim Doumar, Hideo Ito, 2000 IEEE

More Related