1 / 25

Trace-Level Speculative Multithreaded Architecture

ICCD´02, Freiburg (Germany) - September 16-18, 2002. Trace-Level Speculative Multithreaded Architecture. Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain cmolina@etse.urv.es Antonio González and Jordi Tubella

azia
Download Presentation

Trace-Level Speculative Multithreaded Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICCD´02, Freiburg (Germany) - September 16-18, 2002 Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spaincmolina@etse.urv.es Antonio González and Jordi Tubella Universitat Politècnica de Catalunya – Barcelona, Spain {antonio,jordit}@ac.upc.es

  2. Outline • Motivation • Related Work • TSMA • Performance Results • Conclusions

  3. Motivation • Two techniques to avoid serialization caused by data dependences • Data Value Speculation • Data Value Reuse • Speculation predicts values based on past • Reuse is posible if has been done in the past • Both may be considered at two levels • Instruction Level • Trace Level

  4. Trace Level Reuse Static Dynamic Trace Level Reuse • Set of instructions can be skipped in a row • These instructions do not need to be fetched • Live input test is not easy to handle

  5. Trace Level Speculation With Live Output Test With Live Input Test Trace Level Speculation • Solves live input test • Introduces penalties due to misspeculations • Two orthogonal issues • microarchitecture support for trace speculation • control and data speculation techniques • prediction of initial and final points • prediction of live output values

  6. Live Output Actualization & Trace Speculation INSTRUCTION EXECUTION NOT EXECUTED LIVE INPUT VALIDATION & INSTRUCTION EXECUTION Trace Level Speculation with Live Input Test ST NST Miss Trace Speculation Detection & Recovery Actions

  7. Live Output Actualization & Trace Speculation BUFFER BUFFER INSTRUCTION EXECUTION NOT EXECUTED LIVE OUTPUT VALIDATION Trace Level Speculation with Live Output Test ST NST Miss Trace Speculation Detection & Recovery Actions

  8. Related Work • Trace Level Reuse • Basic blocks (Huang and Lilja, 99) • General traces (González et al, 99) • Traces with compiler support (Connors and Hwu, 99) • Trace Level Speculation • DIVA (Austin, 99) • Slipstream processors (Rotenberg etal, 99) • Pre-execution (Sohi et al, 01) • Precomputation (Shen et al, 01) • Nearby and distant ILP (Balasubramonian etal, 01)

  9. ST I Window NST I Window ST Ld/St Queue Branch Decode & Functional Fetch I NST Ld/St Queue Units Engine Predictor Cache Rename ST Reorder Buffer Trace NST Reorder Buffer Speculation Data L1SDC Cache NST Arch. Verification ST Arch. Register File Engine Register File L1NSDC L2NSDC TSMA Look Ahead Buffer

  10. Trace Speculation Engine • Two issues may handle • to implement a trace level predictor • to communicate trace speculation opportunity • Trace level predictor • PC-indexed table with N entries • Each entry contains • live output values • final program counter of trace • Trace speculation communication • INI_TRACE instruction • Additional MOVE instrucions

  11. Look Ahead Buffer • First-input first-output queue • Stores instructions executed by ST • The fields of each entry are: • Program Counter • Operation Type: indicates memory operation • Source register Id 1 & source value 1 • Source register Id 2 & source value 2 • Destination register Id & destination value • Memory address

  12. Verification Engine • Validates speculated instructions • Mantains the non-speculative state • Consumes instructions from LAB • Test is performed as follows: • testing source values of Is with non-speculative state • if matching, destination value of I may be updated • memory operations check effective address • store instructions update memory, rest update registers • Hardware required is minimal

  13. Thread Synchronization • Handles trace misspredictions • Recovery actions involved are: • Instruction execution is stopped • ST structures are emptied (IW,LSQ,ROB,LAB) • Speculative cache and ST register file are invalidated • Two types of synchronization • Total (Occurs when NST is not executing instructions) • Penalty due to fill again the pipeline • Partial (Occurs when NST is executing instructions) • No penalty • NST takes the role of ST

  14. Rules 1 2 3 4 5 Additional and small first level cache is added to mantain memory speculative state Traditional memory subsystem is supported L1SDC NST store updates values and allocate space in NS caches ST store updates values in L1SDC only ST load get values from L1SDC. If not, get from NS caches NST loads get values and allocates space in NS caches Line replaced in L1NSDC is copied back to L2NSDC L1NSDC L2NSDC Memory Subsystem • Mantains memory state • speculative • non speculative

  15. Register File • Slight modification to permit prompt execution • Register map table contains for each entry: • Commited Value • ROB Tag • Counter • Counter field is mantained as follows: • New ST instruction increases dest. register counter • Counter is decreased when ST instruction is commited • After trace speculation counter are no longer increased • But it is decreased until reaches the value zero.

  16. 10 10 Live Output Actualization & Trace Speculation NST Begins Execution NST Executes Speculated Trace VE Validates Instructions ST Begins Execution VE Begins Verification VE Finishes Verification Live Output Actualization & Trace Speculation NST Executes Some Additional Instructions NST Execution 8 8 2 3 4 5 6 7 9 1 9 7 2 3 4 5 6 1 INSTRUCTION EXECUTION NOT EXECUTED LIVE OUTPUT VALIDATION Working Example ST NST VE

  17. Experimental Framework • Simulator Alpha version of the SimpleScalar Toolset • Benchmarks Spec95 • Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 • Statistics Collected for 125 million instructions Skipping initializations

  18. Base Microarchitecture

  19. TSMA Additional Stuctures

  20. Performance Evaluation • Main objective: • trace misspeculations cause minor penalties • Traces are built following a simple rule • from backward branch to backward branch • minimum and maximum size of 8 and 64 respectively • Simple Trace Predictor is evaluated • Stride + Context Value (history of 9) • Results provided • Percentage of misspeculations • Percentage of predicted instructions • Speedup

  21. Misspeculations 100 90 80 70 60 50 40 30 20 10 0

  22. Predicted Instructions 50 40 30 20 10 0

  23. Speedup 1.35 1.30 1.25 1.20 1.15 1.10 1.05 1.00

  24. Conclusions • TSMA • designed to exploit trace-level speculation • Special emphasis on • minimizing misspeculation penalties • Results show: • architecture is tolerant to misspeculations • speedup of 16% with a predictor that misses 70%

  25. Future Work • Agressive trave level predictors • bigger traces • better value predictors • Generalization to multiple threads • cascade execution • Mixing prediction & execution • speculated traces do not need to be fully speculated

More Related