1 / 19

Lecture 17: Core Design

Lecture 17: Core Design. Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed. Register Rename Logic. Map Table. Physical Source Regs. Physical Dest Regs. Logical Source Regs. Mux.

jgroves
Download Presentation

Lecture 17: Core Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 17: Core Design • Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed

  2. Register Rename Logic Map Table Physical Source Regs Physical Dest Regs Logical Source Regs Mux Free Pool Logical Dest Regs Dependence Check Logic Logical Source Reg

  3. Map Table – RAM 7-bits 7-bits 7-bits 7-bits 7-bits Phys reg id Num entries = Num logical regs Shadow copies (shift register)

  4. Map Table – CAM 5-bits 1-bit 1-bit Logical reg id v a l i d Num entries = Num phys regs Shadow copies

  5. Wakeup Logic tag1 tagIW … = = or or rdyL tagL tagR rdyR . . . . . . rdyL tagL tagR rdyR

  6. Selection Logic Issue window req grant enable anyreq Arbiter cell enable • For multiple FUs, will need sequential selectors

  7. Structure Complexities • Critical structures: register map tables, issue queue, LSQ, register file, register bypass • Cycle time is heavily influenced by: window size (physical register size), issue width (#FUs) • Conflict between the desire to increase IPC and clock speed • Can achieve both if we use large structures and deep pipelining; but, some structures can’t be easily pipelined and long-latency structures can also hurt IPC

  8. Deep Pipelines • What does it mean to have  2-cycle wakeup  2-cycle bypass  2-cycle regread

  9. Frequency Scaling Options 2-cycle wakeup 2-cycle regread 2-cycle bypass Pipeline Scaling 20-IQ F 20-IQ F F F 40 Regs 40 Regs F F F F Capacity Scaling 15-IQ F F Replicated Capacity Scaling 30 Regs F 15-IQ F 15-IQ F F F 30 Regs 30 Regs F F

  10. Recent Trends • Not much change in structure capacities • Not much change in cycle time • Pipeline depths have become shorter (circuit delays have reduced); this is good for energy efficiency • Optimal performance is observed at about 50 pipeline stages (we are currently at ~20 stages for energy reasons) • Deep pipelines improve parallelism (helps if there’s ILP); Deep pipelines increase the gap between dependent instructions (hurts when there is little ILP)

  11. ILP Limits Wall 1993

  12. Techniques for High ILP • Better branch prediction and fetch (trace cache)  cascading branch predictors? • More physical registers, ROB, issue queue, LSQ  two-level regfile/IQ? • Higher issue width  clustering? • Lower average cache hierarchy access time • Memory dependence prediction • Latency tolerance techniques: ILP, MLP, prefetch, runahead, multi-threading

  13. Impact of Mem-Dep Prediction • In the perfect model, loads only wait for conflicting stores; in naïve model, loads issue speculatively and must be squashed if a dependence is later discovered From Chrysos and Emer, ISCA’98

  14. Clustering Reg-rename & Instr steer IQ IQ Regfile Regfile F F F F 40 regs in each cluster p21  p2 + p3 p22  p21 + p2 p42  p21 p41  p56 + p57 p43  p42 + p41 r1  r2 + r3 r4  r1 + r2 r5  r6 + r7 r8  r1 + r5 r1 is mapped to p21 and p42 – will influence steering and instr commit – on average, only 8 replicated regs

  15. 2Bc-gskew Branch Predictor BIM Address Pred G0 Vote Address+History G1 Meta 44 KB; 2-cycle access; used in the Alpha 21464

  16. Rules • On a correct prediction • if all agree, no update • if they disagree, strengthen correct preds and chooser • On a misprediction • update chooser and recompute the prediction • on a correct prediction, strengthen correct preds • on a misprediction, update all preds

  17. Runahead Mutlu et al., HPCA’03 Trace Cache Current Rename IssueQ Regfile (128) Checkpointed Regfile (32) ROB L1 D Runahead Cache Retired Rename FUs When the oldest instruction is a cache miss, behave like it causes a context-switch: • checkpoint the committed registers, rename table, return address stack, and branch history register • assume a bogus value and start a new thread • this thread cannot modify program state, but can prefetch

  18. Memory Bottlenecks • 128-entry window, real L2  0.77 IPC • 128-entry window, perfect L2  1.69 • 2048-entry window, real L2  1.15 • 2048-entry window, perfect L2  2.02 • 128-entry window, real L2, runahead  0.94

  19. Title • Bullet

More Related