200 likes | 392 Views
Recon fi guration Based Fault - Tolerant Systems Design - Survey of Approaches. Jan Balach , Ondřej Novák FIT, CTU in Prague MEMICS 2010. Outline. Introduction FPGAs and SEU Recon fi guration based Fault-Tolerant d esigns Improved testing FT structures based on partial reconfiguration
E N D
Reconfiguration Based Fault-Tolerant SystemsDesign - Survey of Approaches Jan Balach, Ondřej Novák FIT, CTU in Prague MEMICS 2010
Outline • Introduction • FPGAs and SEU • Reconfiguration based Fault-Tolerant designs • Improved testing • FT structures based on partial reconfiguration • High-performance FT design • Tranzistor & gate level reconfiguration • Flash-based FPGAs • Reconfigurable Electronics for Space • Conclusion
Introduction • SRAM-Based FPGA • FPGA is the most used platform for developing new designs and systems • FPGA dependability and reliability are most discussed issues
FPGAs and SEU • FPGA is sensitive to natural radiation effects, the most discussed ones are so called Single Event Upsets • SEU can impact FPGA in different ways: • Change of conguration memory • Generated pulse on interconnection • Causing Latch-up • Affecting non-programed part of FPGA • Affecting clock domain distribution • Different situation requires specific solution
Reconfiguration based Fault-Tolerant designs • Fault-Tolerant desing = redundancy • Redundancy serves only for a given time • We have to use reconfiguration to keep FPGA’s FT parameters • There are different ways how we can use reconfiguration to achieve FT design
Improved testing I. • Testing is important part of dependable design flow • Testing allows us to: • Prove design right functionality • Localize Faults • Prevent latent Faults
Improved testing II. • BIST architecture based on reconfiguration • Improved Test Access Mechanism • Can obtain high overhead caused by bus macros Picture from: Rozkovec, M., Novak, O., “Structural test of programmed FPGA circuits"
FT structures based on partial reconfiguration I. • Reconfiguration allows various options how to implement FT design • Basic idea is to divide design in smaller parts which can be reconfigured/replaced • Smaller the parts bigger the overhead is, we need to find trade-off
FT structures based on partial reconfiguration - app. A* • Each application divided into many small so called partial reconfigurable modules • Reconfiguration supervised by partial reconfigurable controller • Good fault localization, fault impacts smaller area of design, can obtain high HW overhead (bus macros), synchronization issues after reconfiguration *) Straka M., Kastil J., Kotasek Z., “Fault Tolerant Structure for SRAM-based FPGAvia Partial Dynamic Reconguration"
FT structures based on partial reconfiguration - App. B* *) Borecky J., Kohlik M., Kubatova H., Kubalik P., “Fault Coverage Improvementbased on Fault Simulation and PartialDuplication"
FT structures based on partial reconfiguration - App. B • Fault impacts relatively big part of design • Obtained HW overhead is smaller • Synchronization after reconfiguration has to be solved
FT structures based on partial reconfiguration - App. C* • Self-Repair Dual FPGA architecture used • Design divided into columns, spares columns allow Self-Repair ability • Soft microcontroller evaluates flags from second FPGA, in case of error, faulty FPGA is reconfigured by another one • Obtaining good trade-off between overhead and fault localization • Using same bit stream in both FPGA can be risky *) S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, E.J. McCluskey, “RecongurableArchitecture for Autonomous Self Repair"
High-performance FT system* • SEU dosage varies with place on the orbit = we can use reconfiguration to switch modes • When lower density of SEU we can switch to High-performance or power-safe mode • Using High-Performance mode speeds-up computation by 2.3x compared to use of standard TMR *) Jacobs, A., George, A.D., Cieslewski, G.,”Recongurable fault tolerance: A frame-work for environmentally adaptive fault mitigation in space”
Transistor and gate level reconfiguration* • Reconfiguration is performed on transistor/gate level • Redundant N/P diffusions can tolerate faults in silicon *) H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits"
Transistor and gate level reconfiguration • Replacing whole faulty gate • Obtained HW overhead is between 30-120% • Requires supervising in layout Taken from: H. T. Vierhaus, "Transistor and Gate Level Self Repair for Logic Circuits"
Flash-based FPGA • Configuration stored in Flash memory • Alternative platform to develop FT design • Intrinsically SEU hard configuration memory • Slower then SRAM-based FPGAs • Higher voltage required to perform programming
Reconfigurable Electronics for Space* • NASA Rovers on MARS • On board Xilinx FPGA • Reconfiguration performed by ASIC Analog/Digital SRAAs • FPGA implements digital interface between PC and Proto Board *) Didier Keymeulen, "Self-Repairing and Tuning Recongurable Electronics forSpace"
Conclusion I. • Reconfiguration allows us to created FT design in FPGA • Reconfiguration based systems fight high area overhead • Synchronization issues is mostly overlooked, but it has to be solved
Conclusion II. • FPGA reconfiguration for space applications is due to harsh environment unreliable • Most approaches don’t take into account industrial requirements • Areas like aerospace or railway can benefit from reconfiguration