1 / 21

EECS 470

EECS 470. ILP and Exceptions Lecture 7 Coverage: Chapter 3. Optimizing CPU Performance. Golden Rule: t CPU = N inst *CPI*t CLK Given this, what are our options Reduce the number of instructions executed Reduce the cycles to execute an instruction Reduce the clock period

justus
Download Presentation

EECS 470

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 470 ILP and Exceptions Lecture 7 Coverage: Chapter 3

  2. Optimizing CPU Performance • Golden Rule: tCPU = Ninst*CPI*tCLK • Given this, what are our options • Reduce the number of instructions executed • Reduce the cycles to execute an instruction • Reduce the clock period • Our first focus: Reducing CPI • Approach: Instruction Level Parallelism (ILP)

  3. Why ILP? • Requirements • Parallelism • Large window • Limited control deps • Eliminate “false” deps • Find run-time deps Vs.

  4. How Much ILP is There?

  5. How Large Must the “Window” Be?

  6. ALU Operation GOOD, Branch BAD Expected Number of Branches Between Mispredicts E(X) ~ 1/(1-p) E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts

  7. How Accurate are Branch Predictors?

  8. Impact of Physical Storage Limitations • Each instruction “in flight” must have storage for its result • Really worse than this because of mispeculation…

  9. Registers GOOD, Memory BAD • Benefits of registers • Well described deps • Fast access • Finite resource • Memory loses these benefits for flexibility *p = … *q = … … = *p ?

  10. “Bottom Line” for an Ambitious Design

  11. First Optimization: Out-of-Order Writeback

  12. Playing by the Rules: In-order Writeback IF ID D1 D2 D3 D4 D5 MEM WB DIV.D ADD IF ID EX MEM WB

  13. Playing by the Rules: In-order Writeback Divide by Zero! IF ID D1 D2 D3 D4 D5 MEM WB DIV.D ADD IF ID EX MEM WB What’s wrong with this picture?

  14. Playing by the Rules: In-order Writeback Divide by Zero! IF ID D1 D2 D3 D4 D5 MEM WB DIV.D ADD IF ID EX MEM WB What’s wrong with this picture? IF ID D1 D2 D3 D4 D5 MEM WB DIV.D ADD IF ID EX MEM WB stall stall stall stall

  15. Another Way to Get in the Same Mess • Many systems use microcode • Simplifies mapping of complex instructions to CPU resources • iA32 add-with-carry • ADC (EAX),EBXtmp = MEM[EAX]tmp = tmp + EBX+CF, update CFMEM[EAX] = tmp Side Effect! Potential Fault!

  16. Exceptions and Interrupts

  17. Solution: Precise Interrupts • Implementation approaches • Don’t • E.g., Cray-1 • Force in-order WB • E.g., ARM SA-1 • Force in-order checks • E.g., Alpha 21064 • Buffer speculative results • E.g., P4, Alpha 21264 • History buffer • Future file/Reorder buffer Instructions Completely Finished Precise State PC Speculative State No Instruction Has Executed At All

  18. ROB Precise Interrupts via the Reorder Buffer • @ Alloc • Allocate result storage at Tail • @ Sched • Get inputs (ROB T-to-H then ARF) • Wait until all inputs ready • @ WB • Write results/fault to ROB • Indicate result is ready • @ CT • Wait until inst @ Head is done • If fault, initiate handler • Else, write results to ARF • Deallocate entry from ROB Any order MEM IF ID Alloc Sched EX CT In-order In-order ARF PC Dst regID Dst value Except? Head Tail • Reorder Buffer (ROB) • Circular queue of spec state • May contain multiple definitions of same register

  19. Reorder Buffer Example ROB Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 regID: r8 result: 2 Except: n regID: f1 result: ? Except: ? H T regID: r8 result: 2 Except: n regID: r3 result: ? Except: ? regID: f1 result: ? Except: ? Time H T regID: r8 result: 2 Except: n regID: r4 result: ? Except: ? regID: r3 result: 11 Except: N regID: f1 result: ? Except: ? r3 H T

  20. Reorder Buffer Example ROB Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 regID: r8 result: 2 Except: n regID: r4 result: 5 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: ? H T regID: r4 result: 5 Except: n regID: r8 result: 2 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: y Time H T regID: r4 result: 5 Except: n regID: r3 result: 11 Except: n regID: f1 result: ? Except: y H T

  21. Reorder Buffer Example ROB Code Sequence f1 = f2 / f3 r3 = r2 + r3 r4 = r3 – r2 Initial Conditions - reorder buffer empty - f2 = 3.0 - f3 = 2.0 - r2 = 6 - r3 = 5 H T first inst of fault handler Time H T

More Related