330 likes | 338 Views
Dynamic Voting Schemes to Enhance Evolutionary Repair in Reconfigurable Logic Devices. C. Milliord, C. A. Sharma, and R. F. DeMara University of Central Florida. 29 September 2005. Technical Objective: Autonomous FPGA Regeneration. NASA Moon, Mars, and Beyond :
E N D
Dynamic Voting Schemes to Enhance Evolutionary Repair in Reconfigurable Logic Devices C. Milliord, C. A. Sharma, and R. F. DeMara University of Central Florida 29 September 2005
Technical Objective:Autonomous FPGA Regeneration NASA Moon, Mars, and Beyond: Realize 10’s years service life ??? Increased availability without pre-configured spares … Redundancy increases with amount of spare capacity restricted at design-time based on time required to select spare resource determined by adequacy of spares available (?) yes Reconfiguration allows new fault-handling paradigm Regeneration weakly-related to number recovery capacity variable at recovery-time based on time required to find suitable recovery affected by multiple characteristics (+ or -) yes everyday example spare tire can of fix-a-flat Overhead from Unutilized Spares weight, size, power Granularity of Fault Coverage resolution where fault handled Fault-Resolution Latency availability via downtime required to handle fault Quality of Repair likelihood and completeness Autonomous Operation recover without outside intervention
Problem Statement • FPGAs in Space • Harsh conditions lead to faults in hardware • Radiation • Extreme temperatures • Mechanical stress • Long Mission duration • Experiment with several combinations of GAs and voting schemes • Population of FPGA configurations that are physically distinct, but functionally equivalent • Voting involves 3 or more configurations, with a majority output • Hypothesis • The added space and computation associated with a voting scheme is justified by a quicker and more complete repair
EHW Environments • Evolvable Hardware (EHW)Environmentsenable experimentalmethods to researchsoft computingintelligent search techniques • EHW operates by repetitive reprogramming of real-world physical devices using aniterative refinementprocess: Extrinsic Evolution Intrinsic Evolution Application Two modes of Evolvable Hardware or Genetic Algorithm Genetic Algorithm Stardust Satellite: • >100 FPGAs onboard • hostile environment: radiation, thermal stress • How to achieve reliability to avoid mission failure??? Simulation in the loop Hardware in the loop Done? Build it software model new approach to Autonomous Repair of failed devices device “design-time” refinement device “run-time” refinement
Genetic Algorithms (GAs) start • Initial population of configurations • Functionally equivalent, Physically distinct • Fitness level • Based on number of correct outputs for all possible inputs • Creating a new generation • Mutation “100011101” -> “101011101” • Crossover “101100” & “011110” -> “101110” replacement offspring population of candidate solutions evaluate fitness of individuals Fitness function mutation crossover selection of parents parents Goal reached
Previous Work • [1] Re-routing scheme replaces faulty CLB • Time-saving method with low overhead • [2] TMR fault-detection • On-line approach • High overhead and power consumption • [3] On-line technique using a BIST • Limited power consumption • Spare resources • [4] GA repair of integer multiplier • Voting system may not always outperform individual with the highest fitness • Initialized GA with copies of one hand-designed configuration [1] Xu, J., Si, P., Huang, W., and Lombardi, F., “A novel fault tolerant approach for SRAM-based FPGAs”, Proceedings of the Pacific Rim Int’l Symposium, Dec. 1999, pp. 40-44. [2] Li, Y., Li, D., and Wang, Z., “A new approach to detect-mitigate-correct radiation-induced faults for SRAM-based FPGAs in aerospace application”, Proceedings of the IEE National Aerospace and Electronics Conference, Oct. 2000, pp. 588-594. [3] Abramovici, M., Emmert, J., and Stroud, C., “Roving STARs: an integrated approach to on-line testing, diagnosis, and fault tolerance for FPGAs in adaptive computing systems”, Proceedings of The Third NASA DoD Workshop, July 2001, pp. 73-92. [4] Vigander, S., “Evolutionary fault repair of electronics in space applications”, Dissertation, University of Sussex, Brighton, UK, 2001.
Experimental Setups • Loosely Coupled (LC) Virtex System • PC WorkStation running Xilinx EDK and ISE with AVNET V2Pro PCI card • (SoC) version using PowerPC embedded in FPGA fabric now operational … results reported on previous environment • C++ program that simulates FPGA circuit design/repair • Input files • GA parameters • Logic function truth table • Input/Output pairs • FPGA parameters • Configuration properties of perfect individuals • Simulate repair in voting experiments • Output files • Configuration properties at selected generations • Data showing fitness level at each generation • Produce graphs
Experimental Inputs • GA parameters • Population size • Offspring population size • Mutation rate • Tournament size (2) • Maximum number of generations • FPGA parameters • Number of inputs (6) • Number of outputs (6) • Number of CLBs • Number of look-up tables (LUTs) per CLB (SW only) • Number of LUT select lines (SW only) Ideal Fitness = 60
Experiment #1 • Circuit evolution – no repair • Maximize GA performance before voting (tweak parameters) • Used 200 for max number of generations • Varied the mutation rate from .001 to .097 with a step of .004 • Population sizes of 15, 40, and 50 • 6, 9, 12, 16, and 36 for number of CLBs • Evolve several perfect configurations • repeated the most successful runs for 1000 generations
LUT 0 LUT 0 LUT 2 LUT 2 LUT 1 LUT 1 LUT 3 LUT 3 LUT 0 LUT 2 LUT 1 LUT 3 CLB 0 CLB 1 CLB n FPGA Genetic Representations • Chromosome Goals: • Allow all possible LUT configurations • Allow all possible CLB interconnections given constraints of routing support • Disallow illegal FPGA configurations and non-coding introns (junk DNA) • Facilitate crossover operator • Bitstring representation is natural choice, though may not scale well (investigating generative reps) • Representation shown here is sample specific to Xilinx Virtex FPGA
Generations = 200, pop size = 50, CLBs = 9 Experiment #1 Results
Perfect Individuals • Parameters used in evolving perfect individuals (fitness of 60) • Maximum Number of Generations: 1000 • Mutation Rate: .002 • Population Size: 50 • Number of CLBs: 9 • These create a diverse initial population for TMR style voting in Experiment #2
Three-plex Experiments • Six injected stuck-at faults on LUT inputs • Resulting fitness of perfect individuals: 38, 40, 47 • Parameters • Number of Generations: 400 • Mutation Rate: .089 • Population Size: 50 • Number of CLBs: 9
FPGA Input Data GA #1Configuration GA #2Configuration GA #3Configuration Output Output Output Voter FPGA Output Data Experiment #2 • Simulating repair • Implement voting schemes • Injected stuck-at faults • Implemented 3-plex and 5-plex voting schemes • Chose GA/FPGA parameters according to Experiment #1 • For each voting run, graphed the fitness of best fit individual vs. number of generations for voting elements and system • Repeated 3-plex experiment with a single element (no voting) for 3X number of generations
Three-plex Voting Results Partial Repair: Max Fitness = 58 at generation 68
Three-plex Voting Results Complete Repair achieved at generation 302
Three-plex Voting Results Complete Repair at generation 33
Three-plex Voting Results Perfect fitness is temporarily reached at generation 17
Compare: Single GA Run • 1200 generations • Total GA computation equivalent to a 3-plex run for 400 generations • 3 runs • Max fitness of 56 at 934 generations • Max fitness of 56 at 852 generations • Max fitness of 57 at 274 generations • N-plex Voting advantageous • Improved the likelihood of obtaining a complete repair significantly with fewer total number of circuit evaluations • n x gv << go for n-plex voting with gv voting generations vs. go evolutionary generations without voting
Experiment #3: 5-plex • Six injected stuck-at faults on LUT inputs • Resulting fitness of perfect individuals: 38, 40, 47 • Parameters • Number of Generations: 300 • Mutation Rate: .089 • Population Size: 50 • Number of CLBs: 9
Five-plex Voting Results Complete Repair at generation 48
Five-plex Voting Results Complete Repair fitness at generation 34
Five-plex Voting Results Perfect fitness at generation 2
3-plex vs. 5-plex • 3-plex scheme • 7 out of 10 runs reached perfect fitness • Average of 113.86 generations to do so • 5 out of 10 runs exhibited perfect fitness upon completion (400 generations) • 5-plex scheme • 9 out of 10 reached perfect fitness • Average of 48.33 generations needed • 7 out of 10 exhibited perfect fitness at completion (300 generations)
Conclusion • Autonomous FPGA Repair Strategy combining dynamic redundancy with online evolution • TMR Style Voting beneficial in presence of partial refurbishment • Complete repair can be quickly obtained with three/five imperfectly repaired individuals • Improvement of fitness in an individual GA can outperform voting fitness • Stabilization of a complete repair is more important than how quickly it is achieved • In all six runs where a perfect fitness was obtained after 50 generations, the fitness was maintained • Only 5 of 10 runs which obtained a perfect fitness before 50 generations maintained that fitness for remainder of run
Development Board to Self-Contained FPGA Qualitative Analysis of CRR model • Number of iterations and completeness of regeneration repair • Percentage of time the device remains online despite physical resource fault (availability) Hardware Resource Management • Optimization of hardware profile for Xilinx Virtex II Pro Field Testing on SRAM-based FPGA in a Cubesat mission
Backup Slides • On following pages …
Previous Work on Fault Recovery Normalized Power Consumption (Energy per Operation): n-plex solution using n redundant devices Reconfiguration cost r Gate-Level redundancy g Updated with scan rate s on c CLBs Fault Recovery Characteristics of Selected Approaches