110 likes | 291 Views
Using Trace Cache In SMT. Huaxia Xia June 6, 2001. Why using trace cache in SMT?. For simultaneous multithreading machine, the bottleneck is the instruction fetch bandwidth. Trace cache is an efficient scheme to improve fetch bandwidth. What do we want to know?.
E N D
Using Trace Cache In SMT Huaxia Xia June 6, 2001
Why using trace cache in SMT? • For simultaneous multithreading machine, the bottleneck is the instruction fetch bandwidth. • Trace cache is an efficient scheme to improve fetch bandwidth
What do we want to know? • The impact of trace cache on single thread and simultaneous multithread • The impact of different design options, such as cache associativity, cache size, methods to deal with un-conditional branches, etc
Current progress New data structures: • trace cache in context • multiple branch predictor in context • save instruction in scoreboard • save branch type in scoreboard New procedures: • Fetch phase: get instructions from trace cache, analyze and save the branch type • Commit phase: update the predictor and fill the trace cache
Deal with different branches Predictable branches: • Unconditional: BR, BSR, CALL_PAL • Conditional:FBEQ, FBLT, FBLE, FBNE, FBGE, FBGT, BLBC, BEQ, BLT, BLE, BLBS, BNE, BGE, BGT Unpredictable branches: • Indirect jump: JMP • Indirect call: JSR, JSR_COROUTINE • RET
Discussion Large storage size for trace cache? • Trace cache potentially need more entries than branch predictor. • Need to deal with unconditional predictable branches as well as conditional branches • Different branches cannot share one entry even if they have same branch behavior • Multithread cannot share one trace cache?
Discussion Single thread or multithread can benefit more from the trace cache? Assume the trace cache size is fixed. • For single thread, issue bandwidth is small, so regular prefetching seems enough; But trace cache can less the miss rate • For multithread, high issue bandwidth; But relative small trace cache brings more confliction
if (MD_OP_FLAGS(op) & (F_CTRL|F_UNCOND)) : instruction is an unconditional branch (direct or indirect which include BR, BSR, JSR, JMP, JSR_COROUTINE, RET for the Alpha AXP) if (MD_OP_FLAGS(op) & (F_CTRL|F_COND)) : instruction is a conditional branch (direct or indirect. Those include the Integer and FP conditional branches for the Alpha AXP) if (MD_OP_FLAGS(op) & (F_CTRL|F_DIRJMP)) : instruction is a direct branch (conditional or unconditional which include the Integer and FP conditional branches, BR and BSR for the Alpha AXP) if (MD_OP_FLAGS(op) & (F_CTRL|F_INDIRJMP)) : instruction is an indirect branch (conditional or unconditional which include JSR, JMP, JSR_COROUTINE, RET for the Alpha AXP) if (MD_OP_FLAGS(op) & (F_CTRL|F_CALL)) : instruction is a procedure call (JSR, BSR for the Alpha AXP) if (MD_OP_FLAGS(op) & (F_CTRL|F_FPCOND)) : instruction is a FP conditional branch If any of the conditions is false please let me know about the correct one. Pedictable Uncond_pred:BR and BSR, FBEQ, FBLT, FBLE, FBNE, FBGE, FBGT, BLBC, BEQ, BLT, BLE, BLBS, BNE, BGE, BGT, CALL_PALunpred: JSR, JMP, JSR_COROUTINE, RET
beq Branch if Equal to Zero bne Branch if Not Equal to Zero blt Branch if Less Than Zero ble Branch if Less Than or Equal to Zero bgt Branch if Greater Than Zero bge Branch if Greater Than or Equal to Zero blbc Branch if Low Bit is Clear blbs Branch if Low Bit is Set br Branch Always bsr Branch to Subroutine jmp Jump jsr Jump to Subroutine ret Return from Subroutine jsr_coroutine Jump to Subroutine Return
Data structure of trace cache typedef struct TraceCache{ address_t tag; //the address of the first branch unsigned char branchcount; //# of branches in this trace line unsigned char branchpred; //prediction for the branches unsigned char instrcount; //# of instructions unsigned char blockindex[3]; //index of the basic blocks, bi[0]==0 address_t addr[3]; //the starting addresses of the basic blocks instruction_t instr[16]; }