200 likes | 349 Views
Speculative instruction validation for performance-reliability trade-off. Sumeet Kumar SUNY Binghamton Binghamton, NY 13902 skumar1@binghamton.edu. Aneesh Aggarwal SUNY Binghamton Binghamton, NY 13902 aneesh@binghamton.edu. Cosmic/Alpha Radiation. CLK. Latch. Latch. Logic.
E N D
Speculative instruction validation for performance-reliability trade-off Sumeet Kumar SUNY Binghamton Binghamton, NY 13902 skumar1@binghamton.edu Aneesh Aggarwal SUNY Binghamton Binghamton, NY 13902 aneesh@binghamton.edu caps.cs.binghamton.edu
Cosmic/Alpha Radiation CLK Latch Latch Logic Soft Error What are Soft Errors? 0001 0000 0000 caps.cs.binghamton.edu
Micro-architectural Techniques to Detect Soft Errors • Execute multiple copies of a program • Redundant Multi Threading (RMT) • Probabilistic fault detection techniques • Errors are flagged if the program behavior is out of the ordinary (i.e. Unpredictable) • Probabilistic techniques may have high false alarms, e.g. when instructions do not have predictable behavior caps.cs.binghamton.edu
Simultaneous and Redundantly Threaded (SRT) • SRT is an implementation of RMT in SMT environment • Two copies of a program run simultaneously on a single core • Slack is provided between the two copies for better performance. • Thread running ahead is known as the Main thread, one running behind is known as the Redundant thread • Provides complete fault coverage • Has considerable performance impact (our experiments show 25% performance impact) caps.cs.binghamton.edu
Schematic Diagram of SRT ROB Arch Register Files Register File M Map Table Fetch Buffer Issue Queue M R M M M Fetch Decode Rename Writeback Compare Commit FU R R R M M R Map Table LVQ SVQ R Data Cache M - Main Thread M LSQ R – Redundant Thread LVQ – Load Value Queue R R SVQ – Store Value Queue ECC Protected
Performance-Reliability Trade-off in RMT • Reducing redundancy by reacting to processor state • Avoiding redundant in high IPC phase (PER-IRTR) • RMT toggling • Reducing redundancy by exploiting instruction properties • Instruction Reuse concept (DIE-IRB) • Removing backward slices of silent stores, dead values (SS-mod) or predictable stores (SlicK) caps.cs.binghamton.edu
SpecIV (Speculative Instruction Validation) caps.cs.binghamton.edu
Basic Idea • An instruction validator(similar to data value predictor) is used to store the expected result values of the main thread instructions • Instructions producing values that match the stored value are known as successfully validated instructions • Successfully validated instructions are not redundantly executed caps.cs.binghamton.edu
Schematic Diagram for SpecIV Physical Register File Arch Register Files OFB – Operand Forward Buffer CVQ – Commit Value Queue Dependent on Non executing redundant Instruction Dependent on Executing redundant Instruction Redundant Instruction dropped Fetch Buffer Issue Queue M R M M M Fetch Decode Rename Compare Commit R R R M M M M R R LVQ SVQ CVQ Instruction Validator OFB OFB OFB R R R R M Re-execute bit-vector 0 1 0
Undetected Errors in SpecIV Correct Value Erroneous Value Validator Value Undetected Error Inst X 10 11 11 Error Detected Inst X 10 11 ≠11 Undetected Error Inst X 10 11 10 11 Erroneous Values Only Interested in Single Event Upsets • Errors in OFB and CVQ will be detected, as they are used by redundant thread only
Fault Injection to Measure Vulnerability Source Architectural Register Arch. Register File Register File Source Physical Register ROB Operand Value Map Table Result Value Arch. Register File Decoder Register File Decoder Rename Table Decoder Issue Queue Fetch Decode Rename Writeback Commit FU LSQ caps.cs.binghamton.edu
Hardware Setup for Experimental Results • ROB – 164 Entries • Physical Register File – 128 Int/ 128 Float • Fetch/Decode/Commit Width – 8 • Issue Width – 5 Int/ 3 Float • Issue Queue – 48 Int/ 32 Float • Branch Predictor – Bimodal 4K entries caps.cs.binghamton.edu
Performance Results for SpecIV Instruction Validator Size – 4K Entries IPC caps.cs.binghamton.edu
Instruction Redundancy Reduction Average Reliability Results for SpecIV Average caps.cs.binghamton.edu
Sensitivity to Validator Size Performance Impact Reduction Error Rates caps.cs.binghamton.edu
Performance-Reliability Trade-Off Exploration with SpecIV Performance – Reliability Trade-Offs Performance Trade-Off for Better Reliability Reliability Trade-Off for Better Performance Low Performance Impact High Performance Impact Low Reliability Impact High Reliability Impact Avoiding Redundancy for Producers of Successful Validations Avoiding Low Confidence Validations Multi-Value Validator Result Width & Stride Width Validation Partial Result Validation caps.cs.binghamton.edu
Avoiding Low Confidence Validations(Low Performance Impact) • By stopping validations for entries with no stride the total error rate reduces from 0.45% to 0.23% with negligible performance impact • No additional hardware required to implement this technique Non-Control Instructions Average caps.cs.binghamton.edu
Avoiding Redundancy for Producers of Successfully Validated Instructions(Low Reliability Impact) RBIT 0 Validation Unsuccessful Inst A, R3 op IMM R1 2 1 Validation Successful Inst B, R1 op IMM R30 30 31 Redundant execution reduced by 69% Performance Impact Reduction Increases to 58% Undetected error rate increases to 0.5% Re-execute Bit Vector 1 0 0 caps.cs.binghamton.edu
Conclusion • We propose SpecIV as an effective scheme to achieve performance-reliability trade-off • SpecIV achieves significant reduction in redundant execution, which leads to impressive performance improvement of SRT technique • SpecIV has very small undetected error rate • We also explore the performance-reliability trade-off design space with schemes based on SpecIV, obtaining further performance as well as reliability gains caps.cs.binghamton.edu
Thank You Sumeet Kumar skumar1@binghamton.edu Aneesh Aggarwal aneesh@binghamton.edu caps.cs.binghamton.edu