1 / 21

CSCE 212 Chapter 6 Enhancing Performance with Pipelining

CSCE 212 Chapter 6 Enhancing Performance with Pipelining. Instructor: Jason D. Bakos. Pipelining. MIPS Pipeline. Basic idea: Execute multiple instructions in parallel Split instruction execution into 5 stages Instructions execute in “assembly-line”. fetch. decode. execute. memory.

kleland
Download Presentation

CSCE 212 Chapter 6 Enhancing Performance with Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE 212Chapter 6Enhancing Performance with Pipelining Instructor: Jason D. Bakos

  2. Pipelining

  3. MIPS Pipeline • Basic idea: • Execute multiple instructions in parallel • Split instruction execution into 5 stages • Instructions execute in “assembly-line” fetch decode execute memory write back op/func ctrl/NOOP control MemoryDataIn address A MemRead MemWrite Address MemoryOut MemoryIn PC RegFile rs/rt ALU R B SE/imm SE/imm*4 4 SHAMT A, B registers control for: execute/memory/wb rs/rt/rd instruction register R register control for: memory/wb rs/rt/rd MDR register control for: wb rs/rt/rd

  4. Pipelined MIPS

  5. Pipelined MIPS

  6. Pipelined Control

  7. Pipelined Control

  8. Pipelined Control

  9. MIPS ISA • MIPS pipeline stages • Fetch (F) • read next instruction from memory, increment address counter • assume 1 cycle to access memory • Decode (D) • read register operands, resolve instruction in control signals, compute branch target • Execute (E) • execute arithmetic/resolve branches • Memory (M) • perform load/store accesses to memory, take branches • assume 1 cycle to access memory • Write back (W) • write arithmetic results to register file

  10. Hazards • Hazards are data flow problems that arise as a result of pipelining • Limits the amount of parallelism, sometimes induces “penalties” that prevent one instruction per clock cycle • Structural hazards • Two operations require a single piece of hardware • Structural hazards can be overcome by adding additional hardware • Control hazards • Conditional control instructions are not resolved until late in the pipeline, requiring subsequent instruction fetches to be predicted • Flushed if prediction does not hold (make sure no state change) • Branch hazards can use dynamic prediction/speculation, branch delay slot • Data hazards • Instruction from one pipeline stage is “dependant” of data computed in another pipeline stage

  11. Hazards • Data hazards • Register values “read” in decode, written during write-back • RAW hazard occurs when dependent inst. separated by less than 2 slots • Examples: • ADD $2,$X,$X (E) ADD $2,$X,$X (M) ADD $2,$3,$4 (W) • ADD $X,$2,$X (D) … … • … ADD $X,$2,$X (D) … • … … ADD $X,$2,$3 (D) • In most cases, data generated in same stage as data is required (EX) • Data forwarding • ADD $2,$X,$X (M) ADD $2,$X,$X (W) ADD $2,$3,$4 (out-of-pipe) • ADD $X,$2,$X (E) … … • … ADD $X,$2,$X (E) … • … … ADD $X,$2,$3 (E)

  12. “Load” Hazards • Stalls required when data is not produced in same stage as it is needed for a subsequent instruction • Example: • LW $2, 0($X) (M) • ADD $X, $2 (E) • When this occurs, insert a “bubble” into EX state, stall F and D • LW $2, 0($X) (W) • NOOP (M) • ADD $X, $2 (E) • Forward from W to E

  13. Data Hazards: Forwarding

  14. Data Hazards: Stalling for Load Hazard

  15. Control Hazards • Need to make a branch decision based on data that has yet to be produced: • add $2,$3,$4 • beqz $2,loop • Which stage is branch resolved? • Approaches: • stall • insert bubbles after all branches • always predict untaken • if taken, instructions entering DEC and EX (and MEM?) transfer as NOOPs • branch delay slot • instruction following branch is always executed • dynamic branch predictors

  16. Control Hazards • Instructions are fetched every clock cycle • Branch decisions happen in the EX stage • Solutions: • Assume branch not taken (performs a flush of IF, ID, EX by inserting a nop into the pipeline registers on the clock edge) • Reduce the delay by moving the branch decision up • Requires additional hardware (comparators, etc.) • Might increase cycle time, since register read and resolution are now in series and must be performed in half a cycle to allow for parallel register writes! • Requires forwarding and stall hardware for new data hazards

  17. F D E M W F D E M W F D E M W F F F F F D D D D D E E E E E M M M M M W W W W W Example add $6,$5,$2 lw $7,0($6) addi $7,$7,10 add $6,$4,$2 sw $7,0($6) addi $2,$2,4 blt $2,$3,loop add $6,$5,$2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 instructions, 15 - 4 cycles, CPI = 11/8

  18. Moving up Branch Resolution

  19. Moving up Branch Resolution

  20. Scheduling the Branch Delay Slot

  21. Dynamic Branch Prediction • Assume taken/not-taken (static) • Loops have branches that are usually taken • When wrong, we flush pipeline stages • Deeper pipelines have higher branch penalties (misprediction penalty) • Solution: • Look up address of branch to check if branch was previously taken • One-bit schemes • Two-bit schemes (must be wrong twice to change prediction)

More Related