380 likes | 457 Views
Performance Implications of Faults in Prediction Arrays. Nikolas Ladas Yiannakis Sazeides Veerle Desmet University of Cyprus Ghent University. DFR’ 10 Pisa, Italy - 24/1/2010 HiPEAC2010. Motivation.
E N D
Performance Implications of Faults in Prediction Arrays Nikolas Ladas Yiannakis Sazeides Veerle Desmet University of Cyprus Ghent University DFR’ 10 Pisa, Italy - 24/1/2010 HiPEAC2010
Motivation Technologyscaling: Opportunities and Challenges Reliability and computing tomorrow Failures will not be exceptional Various sources of failures Manufacturing: imperfections, process-variation Physical phenomena: soft-errors, wear-out Power constraints: control operation below Vcc-min Key challenge: provide reliable operationwith little or no performance degradation in the presence of faults with low-overhead solutions 2 Nikolas Ladas 24/1/2010
Architectural vs Non-Architectural Faults So far research mainly focused on correctness Emphasis architectural structures, e.g. caches, registers, buses, alus etc However, faults can occur in non-architectural structures, e.g. predictor and replacement arrays Faults in non-architectural structures may degrade performance Not issue for soft-errors Can be problem for persistent faults: wear-out, process-variation, operation below Vcc-min 3 Nikolas Ladas 24/1/2010
Non-architectural Resources Arrays line predictor branch direction predictor return-address-stack indirect jump predictor memory dependence prediction way, hit/miss, bank predictors replacement arrays (various caches) hysteresis arrays (various predictors) ... Non-Arrays branch target address adder memory prefetch adder .... EV6 like core array bits breakdown Nikolas Ladas 24/1/2010
This talk… Quantify performance implications of faults in non-architectural array-structures Identify which non-architectural array-structures are the most sensitive to faults Do we need to worry about protecting these structures? Nikolas Ladas 24/1/2010
Outline Fault model / Experimental framework Performance implications of faults when all non-architectural arrays are faulty Criticality of the non-architectural arrays studied Fault semantics Conclusions and future direction Nikolas Ladas 24/1/2010
Faults and Arrays Faults may occur in different parts of an array We only consider cell faults WL WL BL BL’ BL BL’ BL BL’ BL BL’ . . . BL BL’ BL BL’ cell cell cell cell WL cell cell cell cell cell cell cell cell cell cell WL cell cell cell cell cell cell WL cell cell cell cell cell cell wordline wordline cell cell WL cell cell cell cell cell cell WL cell cell cell cell cell cell cell WL cell cell . . decoder . bitline WL cell cell cell cell cell cell WL driver Nikolas Ladas 24/1/2010
Array Fault Modeling Key Parameters Number of faults: consider % of cells that are faulty: 0.125 and 0.5 Understand performance trends with increasing number of faults Fault Locations consider random fault locations each affecting 1 cell Try to capture average behavior Model for each fault each faulty cell randomly set at either stuck-at-1 or stuck-at-0 Nikolas Ladas 24/1/2010
Processor Model EV7 like processor with 15 stage pipeline 4-way ooo, mispredictions resolved at commit Non-Architectural Arrays Considered Line Predictor Array: 4K entries, 11 bits/entry Line Predictor Hysteresis Array: 4K entries, 2 bits/entry LRU array for 2-way 64KB 64B/block I$ : 512 entries, 1 bit/entry LRU array 2-way 64KB 64B/block D$ : 512 entries, 1 bit/entry Gshare Direction Predictor: 32Kentries, 2bits/entry Return address stack: 16 entries, 31bits/entry Memory dependence predictor (load-wait) 1024 entries, 1 bit/entry sim-alpha simulator SPEC CPU 2000 benchmarks – 100 M instructions Representative regions Nikolas Ladas 24/1/2010
Experiments Baseline performance: runs with no faults For experiments with faults: For each run all arrays with faults have same % of faulty bits 0.125, 0.5 ALL experiments are performed using the same 100 randomly generated fault maps (50 for each % of faulty bits) 0.125% 0.5% Gshare Direction Predictor 65536 bits: 82 328 Line Predictor Array 45056 bits: 56 225 Line Predictor Hysteresis Array 8192 bits: 10 41 Memory dependence predictor 1024 bits: 1 5 2-way 64KB 64B/block I$ LRU array 512 bits: 1 3 2-way 64KB 64B/block D$ LRU array 512 bits: 1 3 Return address stack 496 bits: 1 3 Nikolas Ladas 24/1/2010
Performance with 0.125% Faulty Bits (all arrays faulty) Nikolas Ladas 24/1/2010
Performance with 0.5% of Faulty Bits (all arrays faulty) Nikolas Ladas 24/1/2010
Observations with all arrays faulty Performance degradation substantial even with small % of faulty bits Both INT and FP benchmarks can degrade 0.125 0.5 Average degradation 1% 3.5% Max degradation 39% 53% Degradation is benchmark specific Instruction mix (different number and type of vulnerable instructions) Programs with high accuracy more vulnerable than those with low accuracies When few arrays entries accessed by a program it takes large number of faults to have faulty entries accessed Some benchmarks are memory dominated Worst-case degradation much greater than average Will cause performance variation between otherwise identical cores/chips Are all bits equally vulnerable? Which unit(s) matter the most? Nikolas Ladas 24/1/2010
Performance for Each Structure(0.125% faulty bits) 26 benchmarks x 50 experiments for each section Nikolas Ladas 24/1/2010
Performance for Each Structure(0.5% faulty bits) 26 benchmarks x 50 experiments for each section 15 Nikolas Ladas 24/1/2010
Observations For the processor configuration used in this study the various non-architectural units are not equally vulnerable to same fraction of faults. RAS and BPRED are the most sensitive to faults Line predictor and load-wait predictor degrade performance significantly when there are 0.5% faults 2-way I$ and D$ are not sensitive even at 0.5% of faults in the LRU array Nikolas Ladas 24/1/2010
Reasons for Variable Vulnerability across units Semantics of faults vary across unit Some faults cause flushing the pipeline, others delay the execution of an instruction, others cause a one-cycle bubble Faults causing delays can be less severe since they can be hidden in the shadow of a misprediction or with ooo Units with typically higher accuracy more vulnerable (RAS and conditional predictor) Even within a unit faults can have different semantics Nikolas Ladas 24/1/2010
Semantics of Faults for a 2-bit Replacement State Action 0x Replace 1x No replace 0/1 Stack-at value 00 R 00 R 00 R 01 R 01 R 01 R 01 R 00 R 10 N 11 N 11 N 11 N 11 N 10 N 10 N 10 N Always Replace Never Replace Nikolas Ladas 24/1/2010
Repair mechanism: XOR Remapping After remapping Fault map Access map XOR 1 Faulty accesses: 14370 • Access map: counts access/entry during an interval • Fault Map: indicates which entries are faulty (can be determined at manufacturing test or at very coarse intervals using BIST) • Remap the index using XOR to minimize faulty accesses • At regular intervals search for the optimal XOR value using the access map and fault map Nikolas Ladas 24/1/2010
Results • 26 benchmarks x 10 fault maps per category • Recovers most of the performance degradation • Possible to make things worse if we remap when there is no need 20 Nikolas Ladas 24/1/2010
Summary-Conclusions Faults in non-architectural arrays can degrade processor performance Not all faults are equally important. Fault semantics vary. RAS and conditional branch predictor the most critical Faults can cause performance non-determinism across otherwise identical chips or within the cores of the same chip 21 Nikolas Ladas 24/1/2010
Future Work Develop analytical model to predict the performance distribution for a given failure rate Understand implications of faults for other architectural and non-architectural structures Nikolas Ladas 24/1/2010
Acknowledgments Costas Kourougiannis Funding: University of Cyprus, Ghent University, HiPEAC, Intel Nikolas Ladas 24/1/2010
Fault Semantics Line Predictor Array: incorrect prediction Conditional, returns get corrected within a cycle, indirects are resolved much later Line Predictor Hysteresis Array: Always update prediction on a misprediction Never update 2-way 64KB 64B/block I$ and D$ LRU arrays Converts sets with faulty LRU bit to direct mapped sets, more misses but can hide Gshare Direction Predictor faulty entries always predict taken or always not-taken Incorrect prediction that gets resolved late (25% chance been lucky) Return address stack Return misprediction is resolved late Memory dependence predictor (load-wait) Independent load wait (common case we should not wait) can partially hide Dependent load not wait (this should rarely be a serious problem) 26 Nikolas Ladas 24/1/2010
Functional Faults and Array Logical View Not practical to study faults at physical level Functional Models: Abstractions that ease study of faults Fault locations: cell, input address, input/output data We only consider cell faults
Interleaved vs Non-Interleaved Design Style (1) Each array wordline contains many entries Entries in the physical implementation are bit-interleaved More area efficient
Interleaved vs Non-Interleaved Design Style (2) But a cluster faults affects more entries in interleaved design For architectural structures: Soft-errors prefer interleaved Hard-errors: map to spare/disable block/set For non-architectural structures: Soft-errors – no need for protection Hard-errors: prefer non-interleaved (if area not issue)
Expected Invariants With increasing faults more performance degradation Frequently accessed entries more critical than less accessed entries Cell stuck-at-1 more critical if bits stored in the cell are biased towards zero