1 / 31

SEFI Mitigation Techniques for Microprocessors

SEFI Mitigation Techniques for Microprocessors. Space Micro Inc. Author: David Czajkowski (760) 815-5330 dcz@spacemicro.com. MSFC & Space Micro Mtg Agenda. Background & need for SEFI mitigation Hardened Core SEFI Mitigation Description Hardened Core Test Setup

lexiss
Download Presentation

SEFI Mitigation Techniques for Microprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEFI Mitigation Techniques for Microprocessors Space Micro Inc. Author: David Czajkowski (760) 815-5330 dcz@spacemicro.com

  2. MSFC & Space Micro Mtg Agenda • Background & need for SEFI mitigation • Hardened Core SEFI Mitigation Description • Hardened Core Test Setup • Proton Radiation Test Results • Hardened Core Design • Hardened Core Roadmap • Conclusions Paper – P15

  3. Background & Need for SEFI Mitigation Hardened Core (aka SEFI Watchdog Controller)

  4. Single Event Functional Interrupt:Microprocessors PowerPC SEFI Data • SEFI (aka “Hangs”) • Processor hangs by SEU • By protons or heavy ions • All CPUs susceptible • CPU Hangs from • Illegal Branching • Upsets in Program Counter • Undefined State Machines • Approx. rate 1 every 100 days – SOI PPC (10d for CMOS) • SEFI problem is Severe & not easily solvable • Power down is current industry solution

  5. Single Event Functional Interrupt:SDRAMs • No known SDRAM without SEFI • SEFI problem greater than SEU problem • SEFI causes >5,000 errors – loss of memory • SEFI not correctable with Hamming EDAC • Reed Solomon EDAC bad for random access • No known solution Note: Data provided above from Maxwell Technologies

  6. Shuttle Upgrade SEFI Problem Flash Parts • CAU program • 36 Intel flash parts • No replacement • SEFI driving system reliability over spec limit • Even with system changes, CAU over spec limit • Improving flash SEFI problem allows CAU to meet system reliability requirements • Caused major redesign of 3 subsystems

  7. Hardened Core SEFI Mitigation Description of the Technique

  8. SEE Source Hardened Core Corrects New SEE Mitigation Techniques • SEFI Hardened Core detects and corrects SEFI faults in microprocessor • Time-Triple Modular Redundancy corrects SEU faults in microprocessor • Both enable the use of advanced commercial microprocessors in space computers • Enables space computers >1,500 MIPS

  9. Watchdog Timer CPU Timer Signal A5 Bus Controller Memory COMM Port 1 COMM Port 2 ADC Port 1 Hardened Core System • More than a Watchdog • H-Core generates periodic signal • If OK, CPU responds • If SEFI, H-Core: • Toggles interrupt • S/W reboot • H/W reset • Power cycle • Post SEFI status flags • Recovery software code

  10. Technical Objectives • Determine the characteristics of SEFI on a CPU • Develop software prototype of Hardened Core. Verify performance in radiation environment • Develop Hardened Core architecture and initial product design • Determine SEFI rate in combo with TTMR SEU rate • Determine performance of TTMR computer with SEFI Watchdog

  11. Hardened Core Test Setup SEFI Mitigation Radiation Test Set using Pentium III in Versalogic VSBC-8 Computer

  12. Test Set Challenges • Finding processor that is not plastic, flip-chip and has known SEFI became difficult and caused schedule risk • Selected a plastic, flip-chip Pentium III (850 MHz) • Changed to proton radiation source to penetrate plastic • Solved de-lidding & thinning issues • Beam availability and high cost lowered available beam time • Resulting in less information on SEFI signatures • Found “partial” hardware watchdog in VSBC-8d • Provided unexpected & additional prototype data

  13. H-Core SEU Test System VSBC-8d Computer RS-232 Communication Link - Ethernet Interrupt Lines Reset Line • Software & Hardware Include: • SEFI test loop • Diagnostic self-test routine • Hardware watchdog to Reset • Linux software watchdog • Local APIC (PIII) routines • Diagnostic self-test • Recovery code = display to screen Monitor Computer • Software Routines: • Mode control • Data Collection • SEFI Identification • Diagnostic self-test

  14. Pentium III SEFI Test Set VSBC Video Monitor PC VSBC Hardware - PIII

  15. VSBC & Pentium Hardware External Fan IDE Drive Pentium w/ Heat Sink Network Switch Multiplex Card VSBC Computer

  16. VSBC Response Monitor S/W Parallel Port S/W SEFI Test Software • VSBC Linux OS • VSBC SEFI Test Loop • Ethernet & serial communicate • Math test • Timer test • Network test • IDE test • Monitor • Communication • Mode control • Datalog • Parallel port control software

  17. Selected Pentium Control Signals • BINIT# - bus state machine reset • INIT# - resets integer registers • LINT0 – INTR interrupt (no avail. Interrupt vector) • IRQ5 – INTR hardware signal thru PCI bus • LINT1 – non-maskable interrupt, or NMI • RESET# - PIII hardware reset • SMI# - system management interrupt (not tested, no available interrupt vector on VSBC)

  18. How to Connect to PIII Signals? • NOT EASY • Multiplex with VSBC’s signals • De-populate PIII pins & hardwire to MUX circuit • Have good technicians

  19. Proton Radiation Test Results Tested Hardened Core using Intel Pentium III in Proton Environment

  20. Hardened Core Radiation Test • Tested at UC Davis with 51 MeV Protons • Test CPU was Pentium III, Intel, 850 MHz • Summary Results: • 21 SEFIs induced • 21 recoveries by SEFI Watchdog Functions • IRQ, NMI and Reset brought back Pentium III • *** Patent Pending *** • RESULT: Hardened Core Proven with Protons

  21. Detailed Test Results

  22. H-Core Success Rate by Signal

  23. Hardened Core Design

  24. Hardened Core Timer Signal CPU A5 Bus Controller Memory COMM Port 1 COMM Port 2 ADC Port 1 Hardened Core is More Than a Chip! • Timer code when NO SEFI • KILL Threads post SEFI • Read H-Core Status Flags • Flush cache & registers • Recovery routines • Rollback software routines • Software + hardware • Software allows for post SEFI Recovery • Rollback data stored in Memory • Store critical variable periodically • Store instruction pointer locations

  25. Programmable Hardened Core Block Diagram • Usable for all CPUs • Min 8 Interrupt signals • MOSFET driver OUT for power cycle control • Variable pulse width • Variable timer length • 1 ms, 1 s, 1 min, etc • Status of CPU saved • Flags available • External ON/OFF control • External H-Core reset

  26. Predicted SEU/SEFI Rates – Proton100k Computer • SEFI 1E-2 corrected resets/day • Using Hardened Core • 2,400 MIPS, 64 bits @400 MHz • >1,440 MIPS SEU corrected • SEU < 1E-5 uncorrected errors/day • No SEL • Total Dose > 100 krad • 4.9 W CPU, 8W total power • VxWorks and Linux OS s/w

  27. Hardened Core Roadmap From Inception to Availability

  28. Hardened Core Roadmap Chip Design • Verification of H-Core complete • Preliminary H-Core Design Complete • Design & manufacture as rad hard chip • Improve H-Core software routines Preliminary Design H-Core Inception Benchtop Model Radiation Verification Software Design Effort is Complete Future

  29. Future Research Options • Collect additional microprocessor recovery data • SEFI test additional processors (PowerPC, BSP-15, TI DSP) • SEFI test more samples (statistical improvement) • Radiation test simpler microprocessor structures • State machine logic (in FPGA) • Instruction pointer to software (in simple micro-controller) • Memory cells • Embedded test logic • Radiation test improved recovery software routines • Thread kill & cleanup routines • H-Core status flag check, used as pointer to restart routines • Restart routines

  30. Hardened Core Planned Availability • Hardened Core has been added to Space Micro’s Proton100k computer product • Circuit for H-Core in Actel FPGAs available now • Stand-alone H-Core IC product available in 2004 • Application software kernels will be made available to customers

  31. Conclusions • SEFI is growing problem for microprocessors • New Hardened Core H/W + S/W solution • Hardened Core benchtop model radiation tested • 850 MHz Intel Pentium III test device • Proton radiation testing completed • Results show 100% success rate • Preliminary design of H-Core complete • Added to Proton100k satellite computer • Space Micro has plan to design & manufacture rad hard chip for commercial availability

More Related