200 likes | 316 Views
Computer Architecture Principles Dr. Mike Frank. CDA 5155 Summer 2003 Module #24 Speculation. Speculation. What’s Speculation?. Unconditional early execution of an instruction that is expected to be needed (based on predicted branch outcome), but that may not be. What makes this difficult?
E N D
Computer Architecture PrinciplesDr. Mike Frank CDA 5155Summer 2003 Module #24Speculation
What’s Speculation? • Unconditional early execution of an instruction that is expected to be needed (based on predicted branch outcome), but that may not be. • What makes this difficult? • Instruction may raise fatal (non-resumable) exceptions that shouldn’t have been raised. • Instruction may have side effects that affect the data-flow of later instructions that shouldn’t be affected. • The conservative approach (never speculate under these conditions) is overly constraining.
A Simple Speculation Example • C source: if (A==0) A=B; else A=A+4; • A in 0(R3), B in 0(R2), R14 available • Original assembly:With speculative load: LD R1,0(R3) LD R1,0(R3) BNEZ R1,L1 LD R14,0(R2) LD R1,0(R2) BEQZ R1,L3 J L2 DADDI R14,R1,#4 L1: DADDI R1,R1,#4 L3: SD R14,0(R3) L2: SD R1,0(R3) Note that this simple transformationdoes not preserve exception behavior! Note that then clause is now effectively unconditional. (Equivalent C code:T=B; if (A!=0) then T=A+4; A=T;). Note use of extra register R14.
Ambitious Speculation Methods Here are some alternatives: • Hardware and OS cooperatively ignore (or delay) exceptions for speculative instructions. • Poison bits mark register values written by speculative instructions that generated exceptions. • Results of speculative instructions are buffered (not committed) until the speculative branch prediction is confirmed. (Sentinel method.)
HW/SW-cooperation Method • A way of coping with non-resumable exceptions in speculative instructions. • Basic strategy: Simply ignore fatal errors in any speculative instructions. • Correct programs will never generate such errors anyway, so, no problem (no “false positives”). • But, incorrect programs may silently go haywire! • Treated as an unavoidable cost of optimization • a kind of imprecise exception handling • In this case, if a program misbehaves in testing, one could always recompile it with strict exception handling (& no speculation) to track down the error.
Example: Speculative Load Inst. • Previous example, with special “Speculative Load” (sLD) instruction: LD R1,0(R3) LD R1,0(R3) sLD R14,0(R2) sLD R14,0(R2) BEQZ R1,L3 BNEZ R1,L1 DADDI R14,R1,#4 SPECCK 0(R2) L3: SD R14,0(R3) J L2 L1: DADDI R14,R1,#4 L2: SD R14,0(R3) This version does not preserveexception behavior, but at leastavoids false positives. Using a separate “speculation check”(SPECCK) instruction to restore correct exception behavior.
Poison Bits • Speculative instructions are marked as such. • Like the “sLD” instruction we saw earlier. • Each ISA register has an associated “poison bit.” • When a speculative inst. generates a fatal exception, then, instead of invoking exception handling, the destination register is marked as “poison.” • Poison is propagated through data dependencies of subsequent speculative instructions. • If a non-speculative instruction ever uses a poisoned register, then that instruction generates a fatal exception which halts the program. • All fatal exceptions do eventually occur, but maybe a bit late vs. normally. (Still pretty easy to debug, tho.)
Poison Bit Example • C src:if (A==0) A=B+8 else A=A+4; LD R1,0(R3) ;Ld A non-speculatively sLD R12,0(R2) ;0(R2)ex.may poison R12 sDADDI R14,R12,#8 ;R14 inherits poison BEQZ R1,L3 ;skip next line if A=0 DADDI R14,R1,#4 ;clears R14 poison bit L3: SD R14,0(R3) ;exception happens here • Note if accessing B causes an exception, it still happens (but late) only if “then” clause runs. R12 R14 Poison bits:
Speculative Insts. w. renaming • Problem: What to do about data-flow when a speculative inst. writes a register that’s later used non-speculatively? • Ordinary solution: Compiler does register renaming, writes speculative results to different (separately allocated) registers. (See sLD example) • Problem: Have to move values between normal & speculative registers, and can run out of registers! • Alternative solution: (“Boosting”) Let the HW do the renaming & buffering of speculative results • Like in Tomasulo’s algorithm.
Sentinel Method • Special “sentinel” instruction marks original location of an instruction moved speculatively. • Write-back (& exception handling) of the speculative instruction is delayed until the corresponding sentinel is reached. • Note writeback never occurs if sentinel not reached! LD BEQ BEQLD sentinel
Hardware-Based Speculation • Combines 3 ideas: • Dynamic branch prediction chooses which instructions will be pre-executed. • Speculation executes conditional instructions early (before branch conditions are resolved). • Dynamic scheduling handles scheduling of different dynamic sequences of basic blocks encountered. • Dataflow execution: Execute instructions as soon as their operands are available. • Like with Tomasulo’s algorithm
Advantages of HW-based spec. • Dynamic speculation can disambiguate memory references, so a store can be moved before a load (if the locations addressed are different). • Speculation works better if more accurate dynamic branch predictions can be used. • Precise exception handling even for speculated instructions. • No extra bookkeeping code (speculation bits, register renaming code) in the program. • Code independent of implementation
Implementing HW-based spec. • Separate the execution of speculative instructions (including dataflow between them) from the committing of results permanently to registers/memory (if speculations are correct). • New structure called the reorder buffer holds results of instructions that have executed speculatively but cannot yet be committed. • The reorder buffer represents non-programmer-visible temporary storage, like the reservation stations in Tomasulo’s algorithm.
Fields of Reorder Buffer Entries • Instruction type field: • “Branch” (no dest.) • “Store” (dest.=memory) • “Register” (dest.=register). • Destination field: • Register number (for loads & ALU ops) • Memory address (for stores) • Value field: • Register or memory value to be stored permanently when instruction commits. • Ready field: Instruction has completed
Steps of Execution in HWBS • Issue (or dispatch): • Get next fetched instruction. • Issue if reservation station & reorder buffer not full. • Check ROB & registers for available operands • Execute: • Monitor CDB for operands until ready, then execute • Write result: • Write to CDB, reorder buffer, & reservation stations • Commit: • When instruction is first in reorder buffer (& wasn’t mispredicted), commit value to register/memory. • Committing mispredicted branch flushes reorder buffer.
HWBS execution example (3rd ed., p. 229) L.D F6,34(R2) IEWC L.D F2,45(R3) IEWC MUL.D F0,F2,F4 I EEEEEEEEEEWC SUB.D F8,F6,F2 IEW C DIV.D F10,F0,F6 I EEEEE…EWC ADD.D F6,F8,F2 IEW C Also go through figure 3.30 on p. 230… (40 cycles)
HWBS loop example 1 L.D F0,0(R1) IEWC 1 MUL.D F4,F0,F2 I EE…EWC 1 S.D F4,0(R1) IE WC 1 DADDIU R1,R1,#-8 1 BNE R1,R2,Loop 2 L.D F0,0(R1) 2 MUL.D F4,F0,F2 2 S.D F4,0(R1) 2 DADDIU R1,R1,#-8 2 BNE R1,R2,Loop
Explicit Register Renaming • An alternative to reorder buffers for HWBS: • Have more physical registers than architectural (programmer-visible) registers. • Dynamically map destination ISA register to unused physical register when instruction is issued. • Also track which mapping corresponds to last committed instruction, to support restarts. LastIssued LastCommitted Approach used in:PPC 603/604,MIPS R10000/12000,Alpha 21264,Pentium II/III/4 R1 ISARegisterMap PhysicalRegisters R2 … … … F31