320 likes | 676 Views
Fault Tolerance in Reconfigurable Computing / FPGAs. Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006. Outline. Introduction Modify Configurable Logic Block (CLB) Dynamic Serial Testing Built-In Self Healing (BISH) Hardware Voter
E N D
Fault Tolerance in Reconfigurable Computing / FPGAs Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006
Outline • Introduction • Modify Configurable Logic Block (CLB) • Dynamic Serial Testing • Built-In Self Healing (BISH) • Hardware Voter • Configurable Fault Tolerant Processor (CFTP) • Self-Checking Logic Design (SCLD) • CLB Functional Testing
Introduction • Configurable Logic Block (CLB) • Interconnect Wires • Interconnect Switches • Configured by SRAM contents Configuration SRAM
Modify CLB [4] • Consider faults only in CLB • Shift configuration data • Means load only one configuration for test • Very slow process • Shift this configuration for next tests • Do not change physical design of running application • No intervention at hardware level • Faster • Better results in test diagnosis and defect/fault tolerance
Modify CLB [4] (Cont’d) • SRAM • Assume this as faulty free • Has configuration data • Modify this to enable shifting configuration • Adding a multiplexer • Decide shifting direction • Shifting to east/west/north/south
Modify CLB [4] (Cont’d) • Hardware overhead • Calculate additional transistor count • Calculate device transistor count • Compare them
Dynamic Serial vs Parallel [5] • Reduce test configuration time • Require less i/o pin • Faster and easier
Dynamic Serial vs Parallel [5] (Cont’d) • Consider unprogrammed FPGAs to test • No a specific user designed application configuration • Consider all configurations • Generate and download configurations • Time consuming • Decompose number of configurations • Find test patterns
Dynamic Serial Test [5] (Cont’d) • Function unit • Multiplexers and one D-Type Flip Flop • Test Pattern requirements for multiplexers • Detect stuck-on/off faults of them • Stuck-at faults of all their i/o nets • Bridge faults of data inputs
Dynamic Serial Test [5] (Cont’d) • 11 Test configuration (TC) for function unit • Provide an efficient way to test many function units in short time • 11 TC * 4096 = 45056 TC for XC6216 • Apply parallel testing after this step
Dynamic Serial Test [5] (Cont’d) • Direct Parallel Testing • Test row or column cells at the same time • TC count increases with FPGA size, 11 TC per test unit • Not so efficient • Two – Phase Parallel Testing • Reed-Muller Propagation Chain (RMPC) • 22 TC per test unit, constant • Single faulty function unit location with 4 TC
Dynamic Serial Test [5] (Cont’d) • Proposed Method • Link all function units into a chain • Test chain integrity in baypass mode • Test function unit with its 11 TCs and corresponding test patterns (TP) • Return to bypass mode • Repeat for the next function unit
Dynamic Serial Test [5] (Cont’d) • Compare with parallel testing • Required less TC • 13 TCs, not 22 TCs • Locate fault without additional TC • Use less i/o pin • Simplify test observation
Dynamic Serial Test [5] (Cont’d) • Disadvantage • Propagation path length • Depends on array size • Integrate with parallel approach for large arrays • Additional i/o pins
Built-In Self Healing (BISH) [8] • Run time self configuration • Implement a soft-processor • Manage and execute all procedures • Fault detection/location/repair • Modular redundancy for assurance of working correctly
BISH - Submicron technology problems [8] • Single event upsets (SEU) • Radiation-induced transient errors caused by neutrons from cosmic rays • Alpha particles from packing material • do not physically damage the chip • Changes in memory cell values • Incorrect data • Improper instruction for processor • Increase threat of electromigration • Physical damage to chip
BISH - Tasks [8] • Detection • Scan chain • Regulary capture net values • Analyze them in soft-processor • Diagnosis, Repair • Controlled also by soft-processor • Applied for only SEUs
BISH - Fault Causes [8] • SEU changing a circuit register value • Possibly a transient error • Invalid in next capture after register update • SEU changing configuration memory cell • Wrong functionality assignment on FPGA • Readback configuration • CRC check • Partial reconfiguration if incohorency exits • Permanent physical defect on FPGA • Mark down this defected area
Hardware Voter [6] Bypass double error by substituting errornous data with spare one Congruency level of accepted SEs Detect and correct single errors • Detect and correct single errors on inputs • Bypass double errors in X1, X2, X3 by substuting errornous data with spare one, X4 Unrecoverable error signal Spare Spare
Configurable Fault Tolerant Processor (CFTP) [2] • Applied for spacecraft onboard processing • Triple Modular Redundancy (TMR) for soft processor on FPGA • Mitigate bit errors in computation by detecting and correcting them using voting logic • On orbit updates, reconfigurations, modifications • Detect SEU-induced configuration faults
Self-Checking Logic Design (SCLD) [3] • Map boolean functions into FPGA • Functional cell • Generate complementary outputs • Checker cell • Verify correctness of final outputs • Fault: same value at outputs • Increase number of CLBs used but incorporate self-checking or testability features
SCLD – Fault Types [3] • Single stuck-at faults in RAM cells • Single stuck-at faults on any line of a CLB • Functional faults in any multiplexer within a single CLB • Functional faults in any D-Type Flip Flop within a single CLB • Single stuck-at faults in any pass transistor connecting CLBs
SCLD [3] • k-feasible • 4 inputs for functional cells • 4-feasible boolean functions required • If not, decompose boolean function before map it on FPGA
SCLD – Algorithm [3] • Decompose a sum-of-products expression into 4-feasible expression. Choose the expression with the minimum number of nodes • Map each expression directly into a 4-input function cell • Connect outputs of a pair of intermediate function cells to the inputs of a checker cell, and generate the equations for each output of the checker cell • Cascade the checker cells to form a checker tree. The outputs of the function cell at the last stage are outputs circuit.
CLB Functional Testing [1] • Gate level testing not required • Use CLB functional property • AND, OR gate or any boolean expression • Additional hardware to apply test • Multiplexer • Example for 2-inputs CLB
CLB Functional Testing - Redundant Faults [1] • CLB function = AND gate • Sa0 on first data input of a multiplexer • Sa0 on second data input of a multiplexer • Sa0 on third data input of a multiplexer • Sa1 on fourth data input of a multiplexer • CLB function = OR gate • Sa0 on first data input of a multiplexer • Sa1 on second data input of a multiplexer • Sa1 on third data input of a multiplexer • Sa1 on fourth data input of a multiplexer
CLB Functional Testing [1] • Exhaustive testing applied • Long test length but high fault coverage • 99.81%, compare with 87.90% of gate-level testing
Conclusion • Dynamic reconfigurable environments • Use flexible test of circuits • Repair errors by partial reconfiguration • Do not disturb normal operation in defect on partial hardware • Design your processor on them to provide self-test on circuit
References • [1] Testing of FPGA Logic Cells, E. Bareisa, V.Jusas, K.Motiejunas, R.Seinauskas, 2004 ISSN 1392-1215 Elektronika IR Elektrotechnica. • [2] Configurable Fault-Tolerant Processor (CFTP) for SpaceCraft Onboard Processing, Charles A. Hulme, Herschel H. Loomis, Alan A. Ross, Rong Yuan, 2004 IEEE Aerospace Conference Proceedings • [3] Self-Checking Logic Design for FPGA Implementation, Parag K. Lala, Alfred L. Burress, 2003 IEEE Transactions on Instrumentation and Measurement • [4] FPGAs and Fault Tolerance, Abderrahim Doumar, Hideo Ito, 2001 The 13th International Conference on Microelectronics • [5] Fault Detection and Location of Dynamic Reconfigurable FPGAs, Chi-Feng Wu, Cheng-Wen Wu • [6] FPGA Implementation of Hardware Voter, Milos D. Krstic, Mile K. Stojcev, TELSIKS 2001 IEEE • [7] Testing the Configurability of Dynamic FPGAs, N. Park, S. J. Ruiwale, F. Lombardi, 2000 IEEE • [8] A Self –Healing Real-Time System Based on Run-Time Self Reconfiguration, Manuel G. Gericota, Gustavo R. Alves, Jose M. Ferreira, 2005 IÊEE • [9] Testing Approach within FPGA-based Fault Tolerant Systems, Abderrahim Doumar, Hideo Ito, 2000 IEEE