330 likes | 583 Views
SEFI Mitigation Techniques for Microprocessors. Space Micro Inc. Author: David Czajkowski (760) 815-5330 dcz@spacemicro.com. MSFC & Space Micro Mtg Agenda. Background & need for SEFI mitigation Hardened Core SEFI Mitigation Description Hardened Core Test Setup
E N D
SEFI Mitigation Techniques for Microprocessors Space Micro Inc. Author: David Czajkowski (760) 815-5330 dcz@spacemicro.com
MSFC & Space Micro Mtg Agenda • Background & need for SEFI mitigation • Hardened Core SEFI Mitigation Description • Hardened Core Test Setup • Proton Radiation Test Results • Hardened Core Design • Hardened Core Roadmap • Conclusions Paper – P15
Background & Need for SEFI Mitigation Hardened Core (aka SEFI Watchdog Controller)
Single Event Functional Interrupt:Microprocessors PowerPC SEFI Data • SEFI (aka “Hangs”) • Processor hangs by SEU • By protons or heavy ions • All CPUs susceptible • CPU Hangs from • Illegal Branching • Upsets in Program Counter • Undefined State Machines • Approx. rate 1 every 100 days – SOI PPC (10d for CMOS) • SEFI problem is Severe & not easily solvable • Power down is current industry solution
Single Event Functional Interrupt:SDRAMs • No known SDRAM without SEFI • SEFI problem greater than SEU problem • SEFI causes >5,000 errors – loss of memory • SEFI not correctable with Hamming EDAC • Reed Solomon EDAC bad for random access • No known solution Note: Data provided above from Maxwell Technologies
Shuttle Upgrade SEFI Problem Flash Parts • CAU program • 36 Intel flash parts • No replacement • SEFI driving system reliability over spec limit • Even with system changes, CAU over spec limit • Improving flash SEFI problem allows CAU to meet system reliability requirements • Caused major redesign of 3 subsystems
Hardened Core SEFI Mitigation Description of the Technique
SEE Source Hardened Core Corrects New SEE Mitigation Techniques • SEFI Hardened Core detects and corrects SEFI faults in microprocessor • Time-Triple Modular Redundancy corrects SEU faults in microprocessor • Both enable the use of advanced commercial microprocessors in space computers • Enables space computers >1,500 MIPS
Watchdog Timer CPU Timer Signal A5 Bus Controller Memory COMM Port 1 COMM Port 2 ADC Port 1 Hardened Core System • More than a Watchdog • H-Core generates periodic signal • If OK, CPU responds • If SEFI, H-Core: • Toggles interrupt • S/W reboot • H/W reset • Power cycle • Post SEFI status flags • Recovery software code
Technical Objectives • Determine the characteristics of SEFI on a CPU • Develop software prototype of Hardened Core. Verify performance in radiation environment • Develop Hardened Core architecture and initial product design • Determine SEFI rate in combo with TTMR SEU rate • Determine performance of TTMR computer with SEFI Watchdog
Hardened Core Test Setup SEFI Mitigation Radiation Test Set using Pentium III in Versalogic VSBC-8 Computer
Test Set Challenges • Finding processor that is not plastic, flip-chip and has known SEFI became difficult and caused schedule risk • Selected a plastic, flip-chip Pentium III (850 MHz) • Changed to proton radiation source to penetrate plastic • Solved de-lidding & thinning issues • Beam availability and high cost lowered available beam time • Resulting in less information on SEFI signatures • Found “partial” hardware watchdog in VSBC-8d • Provided unexpected & additional prototype data
H-Core SEU Test System VSBC-8d Computer RS-232 Communication Link - Ethernet Interrupt Lines Reset Line • Software & Hardware Include: • SEFI test loop • Diagnostic self-test routine • Hardware watchdog to Reset • Linux software watchdog • Local APIC (PIII) routines • Diagnostic self-test • Recovery code = display to screen Monitor Computer • Software Routines: • Mode control • Data Collection • SEFI Identification • Diagnostic self-test
Pentium III SEFI Test Set VSBC Video Monitor PC VSBC Hardware - PIII
VSBC & Pentium Hardware External Fan IDE Drive Pentium w/ Heat Sink Network Switch Multiplex Card VSBC Computer
VSBC Response Monitor S/W Parallel Port S/W SEFI Test Software • VSBC Linux OS • VSBC SEFI Test Loop • Ethernet & serial communicate • Math test • Timer test • Network test • IDE test • Monitor • Communication • Mode control • Datalog • Parallel port control software
Selected Pentium Control Signals • BINIT# - bus state machine reset • INIT# - resets integer registers • LINT0 – INTR interrupt (no avail. Interrupt vector) • IRQ5 – INTR hardware signal thru PCI bus • LINT1 – non-maskable interrupt, or NMI • RESET# - PIII hardware reset • SMI# - system management interrupt (not tested, no available interrupt vector on VSBC)
How to Connect to PIII Signals? • NOT EASY • Multiplex with VSBC’s signals • De-populate PIII pins & hardwire to MUX circuit • Have good technicians
Proton Radiation Test Results Tested Hardened Core using Intel Pentium III in Proton Environment
Hardened Core Radiation Test • Tested at UC Davis with 51 MeV Protons • Test CPU was Pentium III, Intel, 850 MHz • Summary Results: • 21 SEFIs induced • 21 recoveries by SEFI Watchdog Functions • IRQ, NMI and Reset brought back Pentium III • *** Patent Pending *** • RESULT: Hardened Core Proven with Protons
Hardened Core Timer Signal CPU A5 Bus Controller Memory COMM Port 1 COMM Port 2 ADC Port 1 Hardened Core is More Than a Chip! • Timer code when NO SEFI • KILL Threads post SEFI • Read H-Core Status Flags • Flush cache & registers • Recovery routines • Rollback software routines • Software + hardware • Software allows for post SEFI Recovery • Rollback data stored in Memory • Store critical variable periodically • Store instruction pointer locations
Programmable Hardened Core Block Diagram • Usable for all CPUs • Min 8 Interrupt signals • MOSFET driver OUT for power cycle control • Variable pulse width • Variable timer length • 1 ms, 1 s, 1 min, etc • Status of CPU saved • Flags available • External ON/OFF control • External H-Core reset
Predicted SEU/SEFI Rates – Proton100k Computer • SEFI 1E-2 corrected resets/day • Using Hardened Core • 2,400 MIPS, 64 bits @400 MHz • >1,440 MIPS SEU corrected • SEU < 1E-5 uncorrected errors/day • No SEL • Total Dose > 100 krad • 4.9 W CPU, 8W total power • VxWorks and Linux OS s/w
Hardened Core Roadmap From Inception to Availability
Hardened Core Roadmap Chip Design • Verification of H-Core complete • Preliminary H-Core Design Complete • Design & manufacture as rad hard chip • Improve H-Core software routines Preliminary Design H-Core Inception Benchtop Model Radiation Verification Software Design Effort is Complete Future
Future Research Options • Collect additional microprocessor recovery data • SEFI test additional processors (PowerPC, BSP-15, TI DSP) • SEFI test more samples (statistical improvement) • Radiation test simpler microprocessor structures • State machine logic (in FPGA) • Instruction pointer to software (in simple micro-controller) • Memory cells • Embedded test logic • Radiation test improved recovery software routines • Thread kill & cleanup routines • H-Core status flag check, used as pointer to restart routines • Restart routines
Hardened Core Planned Availability • Hardened Core has been added to Space Micro’s Proton100k computer product • Circuit for H-Core in Actel FPGAs available now • Stand-alone H-Core IC product available in 2004 • Application software kernels will be made available to customers
Conclusions • SEFI is growing problem for microprocessors • New Hardened Core H/W + S/W solution • Hardened Core benchtop model radiation tested • 850 MHz Intel Pentium III test device • Proton radiation testing completed • Results show 100% success rate • Preliminary design of H-Core complete • Added to Proton100k satellite computer • Space Micro has plan to design & manufacture rad hard chip for commercial availability