340 likes | 352 Views
This lecture covers bimodal and two-level branch prediction schemes, with examples and comparisons to improve processor efficiency and prediction accuracy. Topics include 1-bit and 2-bit branch history tables, global prediction techniques, and pros/cons of dynamic prediction methods.
E N D
Lecture on High Performance Processor Architecture (CS05162) Dynamic Branch Prediction Scheme An Hong han@ustc.edu.cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology
Outline • Bimodal Branch Prediction Scheme • Two-Level Branch Prediction Scheme • 混合预测算法的例子:Alpha 21264的分支预测器 CS of USTC AN Hong
Dynamic Prediction(1):利用单个分支自身历史(基于模式的预测) • Dynamic Prediction :Use run-time information to make prediction • example: Branch Prediction Buffer(1-位预测器) CS of USTC AN Hong
Dynamic Prediction: 1-bit BHT • Branch History Table • Lower bits of PC address index table of 1-bit values • Says whether or not branch taken last time • No address check • Problem: in a loop, 1-bit BHT will cause two mispredictions : • End of loop case, when it exits instead of looping as before • First time through loop on next time through code, when it predicts exit instead of looping CS of USTC AN Hong
Dynamic Prediction:1-bit BHT (Branch Prediction Buffer) • Pros: • Small. 1 bit per entry • can fit lots of entries • Always returns a prediction • Cons: • aliasing between branches • one bit of state mispredicts many branches for (I =0; I<10; I++) { a = a + 1; } Two mispredictions per loop invocation CS of USTC AN Hong
T 11 NT 10 Predict Taken Predict Taken T 00 NT T NT Predict Not Taken Predict Not Taken T 01 NT Dynamic Prediction(2):Bimodal Branch Prediction Scheme(2bits BHT, 2-位饱和预测器) • Solution: 2-bit predictor where change prediction only if get misprediction twice: Use extra state to reduce mispredictions at loop ends • Red: stop, not taken • Green: go, taken • Adds hysteresis to decision making process BHT T = Taken N = Not Taken 2-bit Saturating Up-down Counter CS of USTC AN Hong
Bimodal Branch Prediction Scheme • Strategy:Based on the direction the branch went the last few times it was executed. • Based on a little self-history pattern • Based on a counter • Works well: • when each branch is strongly biased in a particular direction. • For scientific/engineering applications where program execution is dominated by inner-loops. CS of USTC AN Hong
Bimodal Branch Prediction Scheme • 例1:…NNNTNNN… • 1-位预测器,出现2次预测错 • 2-位预测器,出现1次预测错 • 例2:TNTNTN…., 初始状态为01的2-位预测器,出现100%预测错 • BHT方法准确度 • Mispredict because either: • Wrong guess for that branch • Got branch history of wrong branch when index the table • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12% • 4096 about as good as infinite table (in Alpha 21164) • 2-bit已经足够, n-bit (n>2)与2-bit效果差不多 CS of USTC AN Hong
Pros/Cons • Cons: • Now only mispredicts once on each loop • Also good for data-dependent branches where most data points the same way • ex: checking for termination character at the end of a string • Pros: • Still have aliasing problem between branches • only uses information about history of current branch(self-history, or local-history) • But, sequences of branches often correlate CS of USTC AN Hong
Dynamic Prediction(3):利用多个分支全局历史(基于相关的预测) • 例子:Taken from the equntott benchmark if (aa == 2) aa = 0; if (bb == 2) bb = 0; if (aa != bb) {……} • If first two branches taken, third always taken • 在许多整型应用中,控制流常常是很复杂的,并且一个分支的结果经常受已执行的其它分支的结果影响。也就是说,分支是相关的。由于这种相关性,如果只看分支本身的历史,单个分支的历史看上去缺少规律性,如果直接用它来做局部预测,精度就不会高。 CS of USTC AN Hong
基于相关的动态分支预测技术 • 预测原理 • 根据若干条相邻分支最近执行历史的组合——相邻历史(neighbor-history),或称全局历史(global-history)来做预测,因此又称全局预测技术 • 预测器结构:全部采用二级结构 • 第一级为一个分支历史寄存器(Branch History Register, BHR),用于记录所有相关分支最近的执行历史; • 第二级为一个全局的模式历史表(Global pattern history table,PHT),用于记录每条分支的行为状态。 • 典型的全局预测器是Pan, So和Rahmeh提出的基于相关的预测器,McFarling称之为gselect预测器;McFarling在[Pan, So和Rahmeh]的基础上为解决命名冲突提出了gshare预测器; Yeh和Patt提出的两级自适应预测器中的其中三种配置:GAg/GAp /GAs预测器也属全局预测器。 CS of USTC AN Hong
Two-Level Branch Prediction (GAg) • Use history of recent branches CS of USTC AN Hong
n = full address Branch Address(PC) 2-bits n-bits • Denotation • GAp(k) • G: “Global” BHR • A: adaptive • p: “per-address” PHT • k: BHR length 16 entries k=4 k-bits global BHR (Shift left when update) per-address PHTs Global Branch Prediction Scheme: Global BHR/pre-address PHTs 相应于每个路径的子历史表 GAp(k) = GAp(4),又称(k,n)-gselect预测器 CS of USTC AN Hong
通向b3的路径 A:0-0 B:0-1 C:1-0 D:1-1 在b3执行之前可以推测 aa=0 bb=0 aa=0 bb2 aa2 bb=0 aa2 bb2 Global Branch Prediction Scheme: 设计思想 例:考虑下面的代码段: if (aa == 2) // branch b1 aa =0; if (bb == 2) // branch b2 bb =0; if (aa != bb) { // branch b3 …… } 这段代码由编译器转换为以下汇编程序(假定aa和bb分别分配为R1和R2): SUBI R3,R1,#2 BNEZ R3,L1 ;branch b1 (aa != 2) , taken ADD R1, R0, R0 ;aa == 0, not taken L1: SUBI R3,R2,#2 BNEZ R3,L2 ;branch b2 (bb != 2) , taken ADD R2, R0, R0 ;bb == 0, not taken L2:SUB R3,R1,R2 ;R3=aa-bb BEQZ R3,L3 ;branch b3 (aa = = bb) , taken CS of USTC AN Hong
aa bb aa' bb' 到达b3的路径 PHT的当前状态 b3预测结果 b3的实际结果 正确:c 错误:w PHT的下一个状态 0 2 0 0 C 0 N T w 1 2 2 0 0 A 1 N T w 2 2 1 0 1 B 2 T N w 1 2 0 0 0 B 1 N T w 2 2 2 0 0 A 2 T T c 3 1 0 1 0 D 3 T N w 2 1 0 1 0 D 2 T N w 1 2 0 0 0 B 1 N T w 2 0 1 0 1 D 2 T N w 1 1 1 1 1 D 1 N T w 2 1 2 1 0 C 2 T N w 1 1 2 1 0 C 1 N N c 0 2 2 0 0 A 0 N T w 1 2 0 0 0 B 1 N T w 2 0 1 0 1 D 2 T N w 1 2 2 0 0 A 1 N T w 2 0 2 0 0 C 2 T T c 3 0 1 0 1 D 3 T N w 2 1 0 1 0 D 2 T N w 1 2 2 0 0 A 1 N T w 2 Global Branch Prediction Scheme: 设计思想 ③ ④ ⑥ ⑤ aa’和bb’是b1和b2执行后aa和bb的新值 CS of USTC AN Hong ① ②
Global Branch Prediction Scheme: 设计思想 路径 B1 和 B2的方向 B3的实际结果 CS of USTC AN Hong
Pros/Cons of Two-Level Branch Prediction • Pros: • Predicts correlated branch behavior that breaks other predictors • eqntott example • Better overall performance than purely address-based predictors • Cons: • Interference between unrelated branches with same history • example: all loop-end branches will map to same entry in pattern history table • sometimes this is a good thing CS of USTC AN Hong
n = full address Branch Address(PC) 2-bits n-bits • Denotation • GAp(k) • G: “Global” BHR • A: adaptive • p: “per-address” PHT • k: BHR length 16 entries k=4 k-bits global BHR per-address PHTs Global Branch Prediction Scheme GAp(k) 又称(k,n)-gselect预测器 • (0,n)- gselect 预测器,即为2-位(双峰)预测器 • (M,0)- gselect预测器,即为GAg(M)预测器 CS of USTC AN Hong
n =10 Branch Address(PC) 2-bits n-bits • Denotation • GAs(n, 2k) • G: “Global” BHR • A: adaptive • s: “per-set” PHT • k: BHR length 16 entries k=4 k-bits global BHR per-set PHTs Global Branch Prediction Scheme: Global BHR/pre-set PHTs GAs(4,1024) CS of USTC AN Hong
PHT Branch Address(PC) n-bits n n+k + k k-bits BHR by S.T.Pan 1992 Global Branch Prediction Scheme: Gselect Two-level predictor(另一种画法) CS of USTC AN Hong
by Scott McFarling 1993 PHT Branch Address n-bits n nk XOR k k-bits BHR Global Branch Prediction Scheme: Gshare Two-level predictor CS of USTC AN Hong
2-bits 2k entries k-bits BHR = Branch History Register (Shift left when update) PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) Global Branch Prediction Scheme: Global BHR/Global PHT Global branch predictor • Denotation • GAg(k) • G: “Global” BHR • A: adaptive • g: “global” PHT • k: BHR length GAg(k) CS of USTC AN Hong
Problem: Two code sequences may have the same bit pattern in the BHR and thus index the same pattern in the PHT. 2-bits 4096 entries Global branch Path History, Shift left when update 111100001111 k = 12 global BHR global PHT Global Branch Prediction Scheme: Global BHR/Global PHT GAg(12) CS of USTC AN Hong
by Scott McFarling 1993 1111 1111 0000 0000 0000 0000 1111 1111 Branch Address BHR 0000 0001 0000 0000 1000 0000 0000 0000 gselect 4/4 0000 0000 11110000 11110000 00000001 01111111 gselect 8/8 00000000 00000001 11111111 Global Branch Prediction Scheme: Gshare Two-level predictor n = 8 k = 8 n = 4 k = 4 n k hashed CS of USTC AN Hong
Global Branch Prediction Scheme • Strategy • Based on the combined history of all recent branches • Based on a Shift register and a counter • Works well • when the direction taken by sequentially executed branches is highly correlated. • 11% additional accuracy (compared with 2-bit scheme) at the extra hardware cost of one shift register • Example: if ( a = = 2) a=0; // b1 if ( b = = 2) b = 0; // b2 if (a != b ) { // b3 … } if (x<1) {...} if (x>1) {...} CS of USTC AN Hong
per-address BHT per-set BHT Global BHR SAg Global PHT GAg PAg PAp SAp per-address PHTs GAp PAs GAs per-set PHTs SAs Two-level adaptive predictors:Variations 基于多个分支全局历史的(基于相关的)预测方法 基于单个分支局部历史的(基于模式的)预测方法 CS of USTC AN Hong
Branch Address (PC) N-bits 2-bits full address k-bits k=4 16 entries 1100 1100 per-address BHT global PHT Local Branch Prediction Scheme: pre-address BHR/global PHT PAg(4) CS of USTC AN Hong
2-bits Branch Address (PC) n-bits n 2k entries k-bits k 2n entries PHT = Pattern History Tables (2-bit Saturating Up-down Counter ) SAg(k) BHT = Branch History Table (Shift left when update) Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor CS of USTC AN Hong
2-bits Branch Address (PC) n-bits n=10 16 entries k-bits k=4 1100 1100 1024 entries PHT BHT Local Branch Prediction Scheme: SAg(n) Two-level adaptive predictor SAg(4) CS of USTC AN Hong
2-bits 2-bits 2-bits Local Branch Prediction Scheme: pre-address BHR/per-address PHTs Branch Address (PC) b2 per-address PHTs b1 N-bits full address n-bits k=4 b1 1100 b2 1100 16 entries per-address BHT PAp(4) CS of USTC AN Hong
Local Branch Prediction Scheme • Strategy:considers the history of each branchindependently and takes advantage of repetitive patterns. • Works well: branches with simple repetitive patterns. • Example: • for (I=1, I<=4; I++){ } //its pattern is(1110)n CS of USTC AN Hong
性能比较 @SPEC89 CS of USTC AN Hong
性能比较 @SPEC89 CS of USTC AN Hong
混合预测算法的例子:Alpha 21264的分支预测器 per-set BHT Global history prediction global PHT Global PHT GAg(12) k =10 n=10 k = 12 Selector (112) Local history prediction Global BHR SAg(10) CS of USTC AN Hong