110 likes | 217 Views
Is SC + ILP = RC?. Quinn Gaumer Duke University. Outline. Motivation Previous SC optimizations SC++ Implementation SC++ Analysis. SC vs. RC. SC Easy Programming Model (No different than uniprocessor) Slower programs. RC Faster programs(20%) Software Assistance.
E N D
Is SC + ILP = RC? Quinn Gaumer Duke University
Outline • Motivation • Previous SC optimizations • SC++ Implementation • SC++ Analysis
SC vs. RC • SC • Easy Programming Model (No different than uniprocessor) • Slower programs • RC • Faster programs(20%) • Software Assistance
Previous SC optimizations • Load forwarding • Loads can return values even if other mem ops are pending • When is this good? • As long as its not exposed to other processors • When is wrong? • Invalidations received before the speculative load is retired. • Problem: ROB can still fill up due to store at head…
Previous SC optimizations • Store Buffering • Waiting Stores moved to LSQ • Problem: reorder buffer still must stop retiring loads if stores are pending.
SC++ • Store-Store bypassing • Speculative State for Memory • Speculation Support • Rollbacks infrequent
Store-Store Bypassing • Speculative History Queue • Holds Stores and completed instructions • Also holds information needed to rollback operations • SHiQ Store OP Store Head Head
Memory Order Violations • When is SC Violated? • Speculative load or store is invalidated, read, or read. • How is Violation Detected? • Block Lookup Table(BLT) contains addresses of speculative memory ops • Invalidations, Replacements, Downgrades cause search of BLT for address
Rollback • Processor and Memory state must be rolled back to first memory operation that accessed offending block • Guarantee Forward Progress? • Speculation prohibited until all pending stores performed. • Rollback can be slow • Requires flushing pipeline, move data between local caches. • Optimizations • Rollback multiple instructions/cycle • Sending responses to invalidations immediately
Qualitative Analysis • Must hold state to allow roll back of both processor and memory • Detect rollbacks quickly • Rollbacks are extremely slow…Does it matter? • Data Races • False Sharing • Cache Conflicts
Results • SC++ theoretically performs as well as RC • SC++ can be physically limited in several ways • Network Latency • SHiQ Size • Cache Size • Why does each affect the speed of SC++(relative to SC or RC)