1 / 20

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults. Songjun Pan 1,2 , Yu Hu 1 , and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences

armen
Download Presentation

IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan1,2, Yu Hu1, and Xiaowei Li1 1Key Laboratory of Computer System and ArchitectureInstitute of Computing Technology Chinese Academy of Sciences 2Graduate University of Chinese Academy of Sciences

  2. Outline Background and Related Work IVF Computing Methodology Experimental Results Conclusions

  3. Background Failure Rate Deep Submicron Era Infant Mortality Stage Useful Life Stage Wear-out Stage Defect escape Faster Aging Lifetime Soft Errors Intermittent faults Intermittent faults are emerging as a major source of failures in microprocessors [DSN’02]

  4. Intermittent Faults Description Occur frequently and irregularly for a period of time Caused by loose connection, manufacturing residuals, process variation, or in-progress wear-out, combined with voltage and temperature fluctuations Characteristics Occur in bursts at the same location Removed if replace the offending circuit Activated or deactivated by PVT (process, temperature, and voltage) variations

  5. Protecting the Microprocessor Information redundancy techniques Parity and error-correcting codes High area overhead High power consumption Hardware redundancy techniques Dual modular redundancy/Triple modular redundancy 100%~200% area overhead Software redundancy techniques Redundant multi-threading 10%~30% performance overhead Conventional protection methods ensure high reliability but also cause high overhead

  6. Trade-off Reliability and Overhead Key Observation Not all faults lead to external program failures A fault in branch predictor: doesn’t matter at all A fault in program counter: almost always matters Which bit matters? ACE bit / un-ACE bit: Architectural Correct Execution (ACE) bit [MICRO’03] ACE bit: If changed will lead to an external error Reliability evaluation Protect the most vulnerable structures

  7. Related Metrics Mean Time To Failure (MTTF) / Mean Time Between Repair (MTBR) Masking effect Structure utilization Soft Error Vulnerability Analysis Architectural Vulnerability Factor (AVF) [MICRO’03] Program Vulnerability Factor (PVF) [HPCA’09] Hard Fault Vulnerability Analysis Hard-Faults AVF (H-AVF) [SIGMETRICS’06] The vulnerability to intermittent faults are rarely considered due to their rich causes and behaviors

  8. Our Contributions Propose a metric Intermittent Vulnerability Factor (IVF) to characterize the vulnerability to intermittent faults IVF definition: a structure’s IVF is the probability an intermittent fault in that structure causes an external visible error Present IVF computing algorithms for reorder buffer and register file Compute IVF with different fault configurations

  9. Intermittent Fault Models Causes and mechanisms Manufacturing residues Timing violations Oxide breakdown Inductive noise Cell Solder joint Electro- migration Crosstalk Soft breakdown Intermittent contacts Variation of metal R&C Fluctuation of leakage current Memory Buses Interconnection lines, buses Power supply Intermittent indetermination Intermittent Stuck-at Intermittent short Intermittent open Intermittent pulse Intermittent delay Fault models at the logic level

  10. Intermittent Stuck-at Faults Intermittent stuck-at faults Change the correct value intermittently to logic one or logic zero Vulnerable structures: storage structures such as memory and register file Key Parameters Burst length/active time/inactivity time Have adverse effect during the active time active time inactive time . . . time burst length burst length

  11. IVF Computing Determine whether an intermittent fault affects program execution or not Analyze ACE bit / critical time Set the three key parameters: burst length, active time, and inactive time Burst length: randomly generated from [10T, 30T] Duty cycle: 50% Start time: randomly generated Compute IVFs for reorder buffer and register file active time inactive time . . . time burst length burst length

  12. Time Active time Inactive time An example of an intermittent fault IVF Computing – Reorder buffer ACE Bit Analysis B2 B3 Z B1 cycle ACE X bit entry Y Planar representation

  13. IVF Computing – Register File Critical Time Analysis F1 F2 F3 … W R1 Allocation R2 Rlast Deallocation Time non- critical non- critical critical time n-1 n+1 register version n

  14. Experimental Setup • Simulated processor configurations • Execution-driven simulator Sim-Alpha • Reorder buffer/register file 80/80 entries • 4 integer ALUs, 2 integer multipliers, 2 float ALUs • Hybrid, 4K global + 2-level 1K local + 4K choice branch predictor • 64KB 2-way L1 data cache, 2MB direct mapped L2 cache • Workload • SPEC2000 integer benchmark suite • Simulate 100M instructions with SimPoint

  15. IVF vs AVF Reorder Buffer IVF varies significantly across benchmarks Longer burst length, higher IVF IVF is much higher than AVF

  16. Different Fault Configurations Reorder Buffer IVF varies little across burst length configuration files IVF varies significantly for different active time

  17. IVF at Entry Level Register File Architecture registers Renaming registers IVF varies across different entries Architecture registers are more vulnerable

  18. Implications • Quantitatively guide reliability design at early design stage and evaluate system reliability • Harden partial structures/entries for high reliability while minimizing the overhead • Razor [MICRO’03] • Parshield [DSN’07] • Easily extend to analyze other structures (issue queue, load/store queue, and cache)

  19. Conclusions • Propose a methodology to characterize the vulnerability of microprocessor structures to intermittent faults • Compute IVF for reorder buffer and register file • IVF varies significantly across inter- and intra-structures, motivating to protect the most vulnerable structures to improve system reliability

  20. Thank You for Your Attention • Question?

More Related