1 / 97

Computer Architecture

Computer Architecture. Lecture 6 Overview of Branch Prediction. 0% 0%. matrix300. 9% 9%. 4096 entries: 2bits per entry Unlimited entries 2 bits per entry. spice. 9% 9%. fpppp. 12% 11%. gcc. 5% 5%. espresso. eqntott. 10% 10%. li.

tadeo
Download Presentation

Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Lecture 6 Overview of Branch Prediction

  2. 0% 0% matrix300 9% 9% 4096 entries: 2bits per entry Unlimited entries 2 bits per entry spice 9% 9% fpppp 12% 11% gcc 5% 5% espresso eqntott 10% 10% li 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions Prediction accuracy of a 4096- entry 2-bit prediction buffer vs. infinite buffer

  3. Local 4096 entries: 2-bits per Unlimited entries 2-bits 1024 entries (2,2) Comparison of 2 bit predictors 0% 0% matrix300 9% 9% spice 5% 9% 9% fpppp 5% 12% 11% gcc 11% 5% 5% espresso 4% eqntott 6% 10% 10% li 5% 0 2 4 6 8 10 12 14 16 18 Frequency of mispredictions (%)

  4. Tournament Predictor P1 Correct P2 Correct Use predictor P2 00 Use predictor P1 11 P1 Correct P2 Correct P2 Correct P1 Correct Use predictor P1 10 Use predictor P2 01 P1 Correct

  5. Misprediction rate of three predictors 8% 7% 6% 5% 4% 3% 2% 1% 0% Local 2-bit Predictor Conditional Branch Mis-prediction Rate. Correlating Predictor Tournament Predictor 0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512 Total Predictor Size (KBits) • Note that predictors of equal capacity must be compared. Sizes of each level have to be selected to optimize prediction accurate. Influencing factors: degree of interference between branches, program likely to benefit from local/global history

  6. Why Prediction • Prediction Reduces Branch hazards in Pipelined Processors. • Used in almost all pipelined processors 0 Mux 1 PC+4 Branch Target Address Cache Actual Next PC Branch Prediction Buffer Branch prediction (T/NT)

  7. A Branch Target Buffer PC of instruction to fetch Prediction Hardware (Counter Etc) Predicted PC Lookup Number of entries In branch target buffer No: not branch instruction; proceed normally Branch predicted taken or untaken = Yes: Instruction is branch, use Predicted PC New PC

  8. Send PC to memory and branch-target buffer IF No Entry found in the branch-target buffer? Yes Send out predicted PC Is Instruction a taken branch? No Yes Yes No Taken Branch? Normal instruction execution Branch correctly Predicted; Continue execution with no stalls Enter Branch instruction address and next PC into branch target buffer Mispredicted Branch, kill fetched instruction EX Handling an instruction with a branch-target ID

  9. Penalties for possible combinations of whether the branch is in the buffer

  10. Static Super Scalar pipeline in operation Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair Type Pipe Stages Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB Int. instruction IF ID EX MEM WB FP instruction IF ID EX MEM WB • 1 cycle load delay causes delay to 3 instructions in Superscalar • instruction in right half can’t use it, nor instructions in next slot

  11. Dynamic Super Scalar pipeline in operation LD/ST Mem Access Wait for Operands Wait for Operands EX TAC Read Reg Integer Wait for Operands Wait for Operands EX CDB #1 Wider Bus FP ISSUE/ Rename to RS CDB #2 Wait for Operands Wait for Operands A 1 A 2 A 3 A 4 Instr. Cache Wait for Operands Wait for Operands M 1 M 2 .. M 7 ISSUE/ Rename to RS Write Reg Wait for Operands Divide Check for RS Check for RAW

  12. Example 1 Loop: L.D F0,0(R1) ;F0=array element ADD.D F4,F0,F2 S.D F4,0(R1) ; store result ADDIU R1,R1,#-8 ;8 bytes (per DW) BNE R1,R2,LOOP ;branch R1!=R2

  13. Dual issue, 1 Integer Unit FPMUL = 3 cc

  14. Dual issue, 1 Integer Unit

  15. Dual issue, 1 Integer Unit

  16. Dual issue, 1 Integer Unit

  17. Dual issue, 1 Integer Unit

  18. Dual issue, 1 Integer Unit

  19. Dual issue, 1 Integer Unit

  20. Dual issue, 1 Integer Unit

  21. Dual issue, 1 Integer Unit

  22. Dual issue, 1 Integer Unit

  23. Dual issue, 1 Integer Unit

  24. Dual issue, 1 Integer Unit

  25. Dual issue, 1 Integer Unit

  26. Dual issue, 1 Integer Unit

  27. Dual issue, 1 Integer Unit

  28. Dual issue, 1 Integer Unit

  29. Dual issue, 1 Integer Unit, FPMUL = 3 cc

  30. Dual issue, 2 Integer Unit

  31. Dual issue, 2 Integer Unit

  32. Dual issue, 2 Integer Unit

  33. Dual issue, 2 Integer Unit

  34. Dual issue, 2 Integer Unit

  35. Dual issue, 2 Integer Unit

  36. Dual issue, 2 Integer Unit

  37. Dual issue, 2 Integer Unit

  38. Dual issue, 2 Integer Unit

  39. Dual issue, 2 Integer Unit

  40. Dual issue, 2 Integer Unit

  41. Dual issue, 2 Integer Unit

  42. Dual issue, 2 Integer Unit

  43. Dual issue, 2 Integer Unit

  44. Speculative Execution • Need to overcome • Branch Hazards • Precise Exception

  45. LD/ST Wait for Operands EX TAC Mem Acces Integer Wait for Operands EX Wait for Operands A 1 A 2 A 3 A 4 Wait for Operands M 1 M 2 .. M 7 Wait for Operands Divide Speculative Pipeline Read Reg ROB CDB ISSUE/ Rename to RS FP Write Reg Check for RS Check for RAW

  46. The Hardware: Reorder Buffer IM • If inst write results in program order, reg/memory always get the correct values • Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit) • If some inst goes wrong, handle it at the time of commit – just flush inst afterwards • Inst cannot write reg/memory immediately after execution, so ROB also buffer the results No such a place in Tomasulo original Fetch Unit Reorder Buffer Decode Rename Regfile S-buf L-buf RS RS DM FU1 FU2

  47. Speculative Tomasulo Algorithm • Issue — get instruction from FP Op Queue • Condition: a free RS at the required FU • Actions: (1) decode the instruction; (2) allocate a RS and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB • Execution — operate on operands (EX) • Condition: At a given FU, At lease one instruction is ready • Action: select a ready instruction and send it to the FU • Write result— finish execution (WB) • Condition: At a given FU, some instruction finishes FU execution • Actions: (1) FU writes to CDB, broadcast to all RSs and to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time

  48. Speculative Tomasulo Algorithm • Commit—update register with reorder result • Condition: ROB is not empty and ROB head inst has finished execution • Actions if no mis-prediction/exception: (1) write result to register/memory, (2) update register status, (3) de-allocate the ROB entry • Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU; (4) reset ROB

More Related