1 / 39

Lecture 11: Pipelining and Branch Prediction

Lecture 11: Pipelining and Branch Prediction. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM). THE QUIZ SHOW!. Today ’ s class will be a quiz show.

candid
Download Presentation

Lecture 11: Pipelining and Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 11: Pipelining and Branch Prediction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM)

  2. THE QUIZ SHOW!

  3. Today’s class will be a quiz show • We will be solving puzzles involving pipelining, branch prediction, and the stack. • Form up into groups of 8 individuals • Points for correct solutions, the extra credit points awarded to the top teams: • 4 pts for 1st place • 3 pts for 2nd place • 2 pts for 3rd place • 1 pt for 4th place

  4. The Rules! • Each group will elect a “buzzer” when the buzzer raises his hand, your group will be called on to solve the puzzle. • One representative will be sent up per group. They will give their answer and explain it. • Once the buzzer has raised his hand, your group must stop discussing the answer!

  5. PIPELINING

  6. Pipelining • Assume r5 != r4 • Assume there is one memory for instructions and data. • During a cycle either data can be loaded for an instruction OR an instruction can be fetched, not both. (100) A structural hazard exists. What is it? str r0, [r1, #16] ldr r0, [r1, #8] cmp r5, r4 beq label add r5, r2, r4 add r5, r5, r0

  7. Pipelining • Assume r5 != r4 • Assume there is one memory for instructions and data. • During a cycle either data can be loaded for an instruction OR an instruction can be fetched, not both. (200) Can this structural hazard be eliminated by adding “bubbles” to the pipeline in the form of NOP instructions? str r0, [r1, #16] ldr r0, [r1, #8] cmp r5, r4 beq label add r5, r2, r4 add r5, r5, r0

  8. Pipelining • Assume r5 != r4 • Assume there is one memory for instructions and data. • During a cycle either data can be loaded for an instruction OR an instruction can be fetched, not both. (300) To guarantee forward progress, how must this hazard be resolved? In favor of data access, or instruction fetching? Why? str r0, [r1, #16] ldr r0, [r1, #8] cmp r5, r4 beq label add r5, r2, r4 add r5, r5, r0

  9. Pipelining • Assume r5 != r4 • Assume there is one memory for instructions and data. • During a cycle either data can be loaded for an instruction OR an instruction can be fetched, not both. (400) Draw the 5-stage pipeline for this code, assume the stages are: Fetch, Decode, Execute, Memory, Writeback. What is the total execution time? str r0, [r1, #16] ldr r0, [r1, #8] cmp r5, r4 beq label add r5, r2, r4 add r5, r5, r0

  10. Pipelining • Assume r5 != r4 • Assume there is one memory for instructions and data. • During a cycle either data can be loaded for an instruction OR an instruction can be fetched, not both. (500) Assume we have a new processor such that when the offset is zero on a memory operation, the Execute stage (ALU) can be skipped. The MEM and EXECUTE can now be overlapped in the pipeline. What speedup is achieved with this new architecture? str r0, [r1, #0] ldr r0, [r10, #0] cmp r5, r4 beq label add r5, r2, r4 add r5, r5, r0

  11. DATA DEPENDENCIES

  12. Data Dependencies (100) Find all data dependencies in this sequence. ldr r1, [r1, #0] and r1, r1, r2 ldr r2, [r1, #0] ldr r1, [r3, #0]

  13. Data Dependencies (200) Find all hazards in this sequence, with and without forwarding, for a 5-stage pipeline assume the stages are: Fetch, Decode, Execute, Memory, Writeback. ldr r1, [r1, #0] and r1, r1, r2 ldr r2, [r1, #0] ldr r1, [r3, #0]

  14. Data Dependencies (300) To reduce the clock cycle time, we are considering a split of the MEM stage into two stages. Find all hazards in this sequence for a 5-stage pipeline, with and without forwarding, assume the stages are: Fetch, Decode, Execute, Memory, Writeback. add r1, r2, r1 ldr r2, [r1, #0] ldr r1, [r1, #4] or r3, r1, r2

  15. Data Dependencies • Assume all data memory values are 0’s. • Assume: • r0 = 0 • r1 = -1 • r2 = 31 • r3 = 1500 • Assume the processor has forwarding logic for hazards. (400) What value is the first one to be forwarded, and what is the value it overrides? add r1, r2, r1 ldr r2, [r1, #0] ldr r1, [r1, #4] or r3, r1, r2

  16. Data Dependencies • Assume all data memory values are 0’s. • Assume: • r0 = 0 • r1 = -1 • r2 = 31 • r3 = 1500 (500) The hazard detection unit assumes forwarding was implemented, but the processor designers, (UF students) forgot to implement it!What are the final register values? What should they be? Add NOPs to this sequence to ensure correct execution despite UF’s screw up! add r1, r2, r1 ldr r2, [r1, #0] ldr r1, [r1, #4] or r3, r1, r2

  17. BRANCH PREDICTION

  18. Branch Prediction (100) When building a branch prediction unit, define for the following cases if the best choice is “branch not taken” or “branch taken” for the prediction: • Branches associated with “If” statements • Branches associated with “Else if” statements • Branches associated with “Else” Statements • Branches associated with “For” Statements

  19. Branch Prediction (200) Design a dynamic branch predictor for if statements and loops. Describe how to implement it in hardware. What new hardware might it require?

  20. Branch Prediction • Assume branch prediction is handled by branch not taken. • Assume one element of the array at r2 is equal to 100. (300) How many times is the branch predicted correctly versus incorrectly? 00: mov r1, #0 01: mov r2, #DEADBEEF LOOP: 02: ldr r3, [r2, r0 lsl 2] 03: cmp r3, #100 04: beq LABEL 05: mov r4, r3 LABEL: 06: add r0, r0, #1 07: cmp r0, #5 08: beq LOOP 09: mov r0, r4 10: add r0, r0, #1

  21. Branch Prediction • Assume branch prediction is handled by branch not taken. • Assume one element of the array at r2 is equal to 100. • Assume the PC pipeline is three instructions deep • Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed. • Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback • Assume branches are evaluated in the issue step, and the pipeline flushed during execute (400) How many cycles does the loop take? 00: mov r1, #0 01: mov r2, #DEADBEEF LOOP: 02: ldr r3, [r2, r0 lsl 2] 03: cmp r3, #100 04: beq LABEL 05: mov r4, r3 LABEL: 06: add r0, r0, #1 07: cmp r0, #5 08: beq LOOP 09: mov r0, r4 10: add r0, r0, #1

  22. Branch Prediction • Assume branch prediction is handled by branch not taken. • Assume the PC pipeline is three instructions deep • Assume the PC pipeline can be flushed in one cycle, and on a miss prediction must be fully flushed. • Assume a pipeline with the phases:Fetch, Decode, Issue, Execute, Memory, and Writeback • Assume branches are evaluated in the issue step, and the pipeline flushed during execute (500) Act as the compiler. Optimize the code for branch not taken. How many cycles does it take? 00: mov r1, #0 01: mov r2, #DEADBEEF LOOP: 02: ldr r3, [r2, r0 lsl 2] 03: cmp r3, #100 04: beq LABEL 05: mov r4, r3 LABEL: 06: add r0, r0, #1 07: cmp r0, #5 08: beq LOOP 09: mov r0, r4 10: add r0, r0, #1

  23. PROCESSOR ARCHITECTURE

  24. Processor Architecture (100) For a five stage pipeline with stages: Fetch, Decode, Execute, Memory, and Writeback, describe what happens in each stage.

  25. Processor Architecture (200) Describe the purpose of a clock signal in a processor. Why do processors need clock signals?

  26. Processor Architecture (300) Describe how during the Decode phase registers are selected from the register file. How is this accomplished in hardware?

  27. Processor Architecture (400) Why must we allocate new registers in the datapath for the writeback register instead of reading it from the decode phase?

  28. Processor Architecture (500) Design a one bit full adder.

  29. REPRESENTATION OF DATA

  30. Representation of Data (100) Describe the difference between big endian and little endian representations.

  31. Representation of Data (200) Represent the following data in big endian and little endian formats: • 00ac8eff • 54897743 • be88fac8

  32. Representation of Data (300) Represent the following data as hexadecimal numbers in big and little endian formats. Assume unsigned integers • 128 • 976

  33. Representation of Data (400) Represent the following data as hexadecimal numbers in big and little endian formats. Assume signed integers • -55 • 99

  34. Representation of Data (500) Write assembly code which takes data from one register in Big Endian format and stores it in a new register in Little Endian format. You may use temporary registers.

  35. FINAL QUESTION

  36. Final Question • Each team should decide an amount of points to bid. • Write down your bids on a sheet of paper and hand them in. • You will have only 60 seconds to answer the next question as a team, write your answers down by the time limit. • Answer correctly and you will add your bid to your score. • Answer incorrectly and you will lose those points.

  37. Final Question In order to detect data hazards, new hardware must be added. Assuming that the registers ids involved in an instruction are available during the decode stage, what hardware would be necessary to check for data hazards?

  38. WRAP UP

  39. For next time • Enjoy your spring break! • Read Chapter 5, sections 5.1 – 5.3

More Related