140 likes | 360 Views
A Penalty-Sensitive Branch Predictor. Yue Hu David M. Koppelman Lu Peng. Department of Electrical and Computer Engineering Louisiana State University . . 1. Motivation. Typical branch predictor: to decrease misprediction rate ( MR ):.
E N D
A Penalty-Sensitive Branch Predictor Yue Hu David M. Koppelman Lu Peng Department of Electrical and Computer Engineering Louisiana State University .
1. Motivation Typical branch predictor: to decrease misprediction rate (MR): i.e. Two-level adaptive (Yeh & Patt), Neural (Vintan & Jimenez) and LTAGE (Seznec) However Performance can also be improved even if MR doesn’t decrease Run 1 Run 2 Time The same program on the same computers but different branch predictors Time that a mispredicted branch is on the wrong path Low penalty (HP) High penalty (HP) Why not favor HP branches to decrease their MR? Even if total MR doesn't decrease,performance could still be improved
Design Overview 2. Design Overview Main predictor 1 2 3 Assistant predictor Figure 1. Overall structure of our predictor 1: Predict a branch: HP or LP? 2: Based on TAGE, can favor HP branches, while only provide normal operation for LP branches; 3: Enabled only when beneficial.
Design Overview 2.1 Penalty Predictor CNT = 0; STA = LP 1-bit penalty state (STA) 8-bit penalty counter (CNT) Penalty >= 120 cyc? No Yes CNT --; CNT += 8; … No Penalty table CNT >= 192? Yes STA = HP No CNT == 0? Yes STA = LP High-penalty state remains at least hundreds of executions, so the following HP branches can get benefits.
Design Overview 2.2 Two-class TAGE Predictor [Only rough idea] Prediction: Hash (His, PC) Index: direct to one entry in each bank; Tag: check whether hit (H) or miss (M); 3-bit pred Higher bank: longer history, wider tag -> more accurate Final Prediction U1 M H U2 H U1 M U1 M M U0 M U0 [9-16]-bit tag 2-bit use (U) 2-bit bimodal predictor wider tag
Design Overview 2.2 Two-class TAGE Predictor Update: mispred First allocation here Since occupied, not used. Second allocation here for HP Since occupied, not used. New entries allocated at higher banks when mispred. LP: only one entry allocated; HP: a second entry allocated with two limitations 1. A bank with a useless entry; HP’s double-entry allocation doesn’t harm that of LP too much 2. Last two allocations in the bank are one-entry allocations;
Design Overview 2.2 Two-class TAGE Predictor Update: mispred First allocation here Since occupied, not used. Second allocation here for HP Since occupied, not used. Two cases for U0 1. Entry itself is not recently useful, if ever; 2. New allocation, usefulness hasn’t been established Double-entry allocation favors HP branches so that their new entries can survive longer time to establish their usefulness.
3.1 Penalty Predictor Performance Analysis covers 98.7% actual HP 1. predicted to be HP (50.2%); 2. among all branches, actual HP (27%); 3. predicted LP while turn out to be HP (1.3%); % Average penalty of branches predicted LP: 121 HP: 212 cycles
3.2 Two-class TAGE predictor Performance Analysis MR All negative 1. MR of HP branches is about 10% higher; Loop branches; branches with cache misses 2. Penalty-Sensitive (PS) method effectively favors HP branch; 3. 64KB: HP, -6E-5; LP, +3E-5. Overall, it is beneficial.
4 Summary Our penalty-sensitive branch predictor works Penalty predictor: 50.2% predicted HP; covers 98.7% actual HP Average penalty ( HP VS LP= 212: 121) Two-class TAGE predictor: favor HP branches, globally beneficial, but limited Limited favoring mechanism: Double-entry allocation for HP branches to increase the chance that their new entries will survive longer time to establish usefulness. Future: more helpful favoring mechanism needed Conclusion: 1. Mispredicted HP branches are more harmful; 2. Even if total MR doesn’t decrease, performance could still be improved by favoring HP branches; 3. Can be applied to any predictors once we can find an effective favoring mechanism.
Thanks! Question & Answer
Penalty Predictor Backup Slides 12
Two-class TAGE predictor Backup Slides MR -6E-5 -4.7E-4 -6E-5 = 12.8% -4.7E-4 Penalty-Sensitive achieved 12.8% improvement on MR of HP Branch that would be achieved by doubling storage budget. 13
Loop Predictor Backup Slides MPPKI Average MPPKI normalized to 1000 Very efficient 1.3% Improvement with only 0.53KB 14