1 / 13

Lecture 13: Pipelining

Lecture 13: Pipelining. Computer Engineering 585 Fall 2001. Delayed Branch (Summary). Where to get instructions to fill branch delay slot? Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken

idalee
Download Presentation

Lecture 13: Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 13: Pipelining Computer Engineering 585 Fall 2001

  2. Delayed Branch (Summary) • Where to get instructions to fill branch delay slot? • Before branch instruction • From the target address: only valuable when branch taken • From fall through: only valuable when branch not taken • Cancelling branches allow more slots to be filled • Compiler effectiveness for single branch delay slot: • Fills about 60% of branch delay slots • About 80% of instructions executed in branch delay slots useful in computation • About 50% (60% x 80%) of slots usefully filled • Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)

  3. Evaluating Branch Alternatives Scheduling Branch CPI speedup speedup scheme penalty unpipelined stall Stall pipeline 3 1.42 3.5 1.0 Predict taken 1 1.14 4.4 1.26 Pred not taken 1 1.09 4.5 1.29 Delayed branch .5 1.07 4.6 1.31 Conditional & Unconditional = 14%, 65% change PC

  4. Overall Costs of Branch Schemes Branch penalty per Avg br penalty Eff CPI with Penalty per per branch conditional branch branch stalls Scheduling unconditional Integer FP Integer FP Integer FP scheme branch Stall pipeline 1.00 1.00 1.00 1.00 1.00 1.17 1.15 Pr edict taken 1.00 1.00 1.00 1.00 1.00 1.17 1.15 Pred not taken 0.62 0.70 1.0 0.69 0.74 1.12 1.11 Delayed branch 0.35 0.25 0.0 0.30 0.21 1.06 1.03

  5. Inst Address Condition 1000 0011 Taken not taken 0100 1000 taken 0001 0000 taken 0000 1111 Dynamic Branch Prediction • Need BTA and Condition. • Earliest: IF stage. I-Cache PC

  6. Exception Handling • I/O • Internal Exceptions • Arithmetic overflow • Illegal Instruction • Memory Address Exceptions • Protection violation • Data alignment error • Page fault • Program initiated • OS services • Trap • Debugging -- breakpoints

  7. Exception Types • Synchronous/asynchronous • Program initiated/forced • Within or between instructions • Restartable/fatal

  8. Exception event IBM 360 VAX Motorola 680x0 Intel 80x86 I/O de vice request Input/output De vice inter r upt Exce ption (Le v el 0...7 V ector ed inter r upt inter r up tion auto v ector) In v oking the oper a t- Super visor call Exce ption (c hang e Exce ption Inter r upt ing system ser vice inter r uption mode super visor (unimplemented (INT instr uction) fr om a user tr a p) instr uction)- pr o g r am on Macintosh T r acing instr uction Not a pplica b le Exce ption (tr ace Exce ption (tr ace) Inter r upt (single- e x ecution f ault) ste p tr a p) Br eakpoint Not a pplica b le Exce ption (br eak- Exce ption (ille g al Inter r upt (br eak- point f ault) instr uction or br eak- point tr a p) point) Inte g er ar ithmetic Pr o g r am inter r up- Exce ption (inte g er Exce ption Inter r upt (o v erfl o w o v er fl o w or under - tion (o v erfl o w or o v erfl o w tr a p or (fl oa ting-point tr a p or ma th unit fl o w; FP tr a p underfl o w fl oa ting underfl o w copr ocessor er r or s) e xce ption) e xce ption) f ault) P a g e f ault (not in Not a pplica b le (onl y Exce ption (tr ansla- Exce ption (memor y - Inter r upt main memor y) in 370) tion not v alid f ault) mana g ement unit (pa g e f ault) er r or s) Misaligned memor y Pr o g r am inter r up- Not a pplica b le Exce ption Not a pplica b le accesses tion (specifi ca tion (ad dr ess er r or) e xce ption) Memor y pr otection Pr o g r am inter r up- Exce ption (access Exce ption Inter r upt (pr otection viola tions tion (pr otection contr ol viola tion (b us er r or) e x ce ption) e xce ption) f ault) Using undefi ned Pr o g r am inter r up- Exce ption (opcode Exce ption (ille g al Inter r upt (in v alid in str uctions tion (oper a tion pr i vile g ed/ instr uction or br eak- op code) e xce ption) r eser v ed f ault) point/unimplemented instr uc tion) Har d w ar e Mac hine-c hec k Exce ption Exce ption Not a pplica b le malfunc tions inter r uption (mac hine-c hec k (b us er r or) a bor t) P o w er f ailur e Mac hine-c hec k Ur g ent inter r upt Not a pplica b le Nonmaska b le i nter r uption inter r upt Exception Handling Terminology

  9. Interrupts and Pipeline • The hard ones are the restartable interrupts; especially the ones that occur in later stages (within type such as page faults, arithmetic exceptions). • A page fault in MEM stage not only affects that instruction; but aborts the three instructions following it!

  10. Interrupts & Pipeline • The objective is to maintain equivalence with the handling of the same interrupt for an unpipelined machine. One hopes to achieve precise interrupts: Interrupt has no effect on the preceding instructions and the interrupting instruction and instructions following it are restarted.

  11. Precise Interrupt Handling T-1 T T+1 T+2 T+3 T+4 T+5 T+6 PageFault Let I T-1 finish. Force IF (at time T+4) from interrupt handler address. Save the PC for I T , the first instruction to be restarted. Restart there after trap; if a non-branch instruction; fetch later instructions sequentially; else compute branch condition and BTA and fetch later instructions.

  12. Interrupt handling contd. • The writes of the following instructions are turned off by squashing or zeroing them. • The exception handler should save the PC value before entering exception handling. • How many PCs need to be saved? • Normally just one --- for the interrupting (faulting) instruction. • The restoring protocol would be to fetch from PC, increment PC and fetch and so on, unless a branch is encountered. • For a branch, evaluate its condition and BTA and load PC with the appropriate address.

  13. Interrupt handling contd. • With Delayed branch, things are not that simple! PC1 PC2 The restoring protocol: fetch from PC1, followed by a fetch from PC2, and then move on sequentially.

More Related