190 likes | 204 Views
CS 7810 Lecture 21. Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998. Leveraging SMT. Recall branch fan-out from “Limits of ILP” Future processors will likely have no shortage of idle thread contexts
E N D
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998
Leveraging SMT • Recall branch fan-out from “Limits of ILP” • Future processors will likely have no shortage of • idle thread contexts • Spawned threads are parallel, but have • dependences with earlier instructions: registers, • uncommitted stores, data cache values • SMT may be an ideal candidate as threads share • the same set of resources
SMT Vs. CMP • A multi-threaded workload (on an SMT) is more • tolerant of branch mpreds – TME makes most • sense if there is a shortage of threads • Power overheads are enormous – on an SMT, • we may not have the option to execute speculative • threads on low-power pipelines • What about energy? • Is CMP a better candidate?
Renaming Overview r1 maps to p1 r1 … r1 br …. r1 p5 … p5 br …. p3 • Every branch causes a checkpoint of mappings, so • we can recover quickly on a mis-predict • Each thread in the SMT can have 8 checkpoints
Threaded Multi-Path Execution • Key elements in TME: • Identifying low-confidence branches • Efficient thread spawning • Efficient recovery on branch resolution • Fetch priorities for each thread on SMT
Path Selection • Only the primary path can spawn threads • (prevents an exponential increase in threads) • For each bpred entry, keep track of successive • correct predictions (reset on mispredict) – if the • counter is less than a threshold, the branch is • low-confidence – note that a small counter size • is more selective in picking low-confidence • branches
Register Mappings • In SMT, each thread can read any physical register • Thread spawning requires a copy of the register • mappings at that branch • A copy involves transfer of (32 x 9 bits) – the new thread • cannot begin renaming until this copy is complete – the • copy may also hold up the primary thread if map table • read ports are scarce • Every new mapping can be placed on a bus and • idle threads can snoop and keep pace
Spawning Algorithm • When threads are idle, they keep pace and spawn a thread • as soon as a low-confidence branch is encountered • When a thread context becomes free and a low-confidence • checkpoint already exists, the new context synchronizes • mappings with the primary context and executes the • primary path, while the old primary context executes the • alternate path after reinstating the checkpoint • If a newly idle thread has a low-confidence checkpoint, • it starts executing the alternate path
Introduced Complexity • Book-keeping to manage checkpoint locations – every • branch has to track the location of its checkpoint • Who frees a register value? • What about memory dependences? • Loads can ignore stores that are not predecessors • Maintain an array of bits to represent the path taken (each basic block corresponds to a bit in the array) • Check for memory dependences only if the store’s path is a subset of the load’s path (p5) r1 (p7) r1 (p8) r1
Processor Parameters • Eight-wide processor with up to eight contexts; each • context has eight checkpoints • 32-entry issue queues, 4Kb gshare branch predictor, • 7 cycle mpred penalty, memory latency of 62 cycles • ICOUNT 2.8: first thread can bring in up to 8 instrs and • the second thread fills in unused slots; occupancy in the • front-end determines priority • Focus on branch-limited programs: compress (20%), • gcc (18%), go (30%), li (6%)
Conclusions • Too much complexity/power overhead, too little benefit? • Benefits may be higher for deeper pipelines; larger windows • (this paper evaluates 8 windows of 48 instrs; does 2 x 192 • yield better results?); longer memory latencies • There is room for improvement with better branch • confidence metrics • CMPs will incur greater cost during thread spawning, but • may be more power-efficient
Title • Bullet