250 likes | 331 Views
TurboROB A Low Cost Checkpoint/Restore Accelerator. Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto 1 Now with AMD/ATI. What Happens on a Branch Misprediction?. Execution Timeline. Predict a Branch Outcome.
E N D
TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl1 and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto 1 Now with AMD/ATI
What Happens on a Branch Misprediction? Execution Timeline Predict a Branch Outcome Predicted Path Correct Path Misprediction Discovered Recover Processor State Redirect Fetch Resume Execution • We wish to make the recovery fast
Recover Mechanisms Overview • ROB: • Buffer all changes • Slow • Instantaneous checkpoints: • Snapshot before speculating • Fast • Problem: can’t have enough checkpoints • Checkpoint prediction • Allocate the few checkpoints judiciously • Speculation control • Sometimes deeper speculation = higher recovery cost • Can hurt performance • Throttle speculation
TurboROB Overview • Complements or Replaces Existing Mechanisms • ROB: recover at any point • TurboROB: recover only at frequent points • Improves performance for most programs • Misprediction performance penalty reduced by 28% on AVG • BranchTap comes “for free” • Very simple to implement • Better than more accurate checkpoint predictors
Outline • Background • BranchTap • Methodology and Results • Summary
State Recovery Example: Register Alias Table Lg(# arch. regs) Original Code RAT A add r1, r2, 100 B breq r1, E C sub r1, r2, r2 p1 p4 p5 p5 p4 Architectural Register p2 p3 # arch. regs Renamed Code A add p4, p2, 100 B breq p4, E C sub r5, p2, p2 Physical Register
B B B B B ROB: Slow, Fine-Grain Recovery Each entry contains • Architectural destination register • Its previous RAT map Program Order 3. Undo RAT updates in reverse order Reorder Buffer • Misprediction discovered 2. Locate newest instruction INVALID RAT • Too slow: recovery latency proportional to number of instructions to squash
B B B B B Global Checkpoints: Fast, Coarse-Grain Recovery Program Order checkpoint checkpoint checkpoint checkpoint Reorder Buffer • Misprediction discovered INVALID RAT • Branch w/ GC: Recovery is “Instantaneous”
RAT checkpoints Working Copy Impact of More Checkpoints Concept ActualImplementation architectural register physical register • More checkpoints ? • Power hungry structure • Increased delay • Only a few checkpoints can practically be implemented • Cannot always cover all branches
Intelligent Checkpointing • State of the art solution • Checkpoint allocation: Allocate checkpoints at hard-to-predict branches • Checkpoint management: Release checkpoints as soon as they are no longer needed • Use few checkpoints efficiently
Conventional Mechanisms: Recovery Scenarios • Mispeculation on a branch w/ a GC: Direct recovery • Mispeculation on a branch w/o a GC: Indirect recovery • With intelligent checkpointing: • 30% Indirect recoveries 75% of performance loss B B B ROB Fast Recovery checkpoint B B B ROB Slow Recovery checkpoint
Outline • Background • BranchTap • Methodology and Results • Summary
BranchTap Motivation Low confidence branch ~ Recovery Cost B B B ROB No Wait Scenario checkpoint checkpoint Misprediction discovered B B B Wait Scenario ROB ~ Recovery Cost checkpoint checkpoint Sometimes, it is better to wait if no checkpoint is available
BranchTap Concept • Key idea: stall when speculation is likely to deteriorate performance • Count the number of low confidence branches w/o a checkpoint • If it exceeds a threshold, stall • Threshold selection • Fixed • Varies greatly across programs • Can deteriorate performance significantly • Adaptive • Robust performance • Minimize recovery cost while conserving good speculation opportunities
Threshold Adaptation Policy • BranchTap adapts across and within applications
Outline • Background • BranchTap • Methodology and Results • Summary
Results Overview • Performance w/o Checkpoints • BranchTap improves even with just an ROB • Performance w/ 4 Checkpoints • BranchTap improves over conventional recovery methods • Performance w/ Larger Checkpoint Predictors • BranchTap offers better performance than a 64x larger predictor
Methodology • Simulator based on Simplescalar • 24 SPEC CPU 2000 benchmarks • Reference Inputs • Processor configurations • 8-way OoO core • Up to 1K in-flight instructions • 1K-entry confidence table for low confidence branch identification • 1B committed instructions after skipping 100B
“Perfect Checkpointing” Configuration • A checkpoint is auto-magically taken at all mispredicted branches • All recoveries are fast • We report the “deterioration relative to perfect checkpointing”
Performance with No Checkpoints • Deterioration relative to “perfect checkpointing” better -39% deterioration • BranchTap improves over conventional mechanisms • Adaptation leads to robust performance improvements
Performance Evaluation with 4 Checkpoints • Deterioration relative to “perfect checkpointing” • BranchTap with 4 checkpoints is better than 6 checkpoints alone better -28% deterioration
BranchTap vs. Larger Checkpoint Predictors • BranchTap with a 1K-entry confidence table and 4 GCs: • Higher performance than a 64K-entry confidence table with 4 GCs • Lower complexity, virtually comes “for free” better deterioration BranchTap confidence table size
Outline • Background • BranchTap • Methodology and Results • Summary
Summary • Performance with 4 (no) checkpoints • ~28 (39) % of misprediction penalty removed • BranchTap is robust: • Up to 6 (13) % better and max 1.2 (0.1) % worse than conventional mechanisms • BranchTap is very simple to implement • Few counters and comparators • BranchTap is better than other alternatives • BT + 1K predictor better than a 64K predictor alone • BT + 4 GCs better than 6 GCs alone
BranchTapImproving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto {pakl, moshovos}@eecg.toronto.edu