180 likes | 318 Views
CS 7810 Lecture 8. Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998. Lifetime of a Load. LSQ Basics. An incomplete store stalls all future loads – No Speculation – the paper is overly conservative
E N D
CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998
LSQ Basics • An incomplete store stalls all future loads – No • Speculation – the paper is overly conservative • because it also waits for store values • Most of these stalls are unnecessary – artificial • dependences
Aggressive Approach • Assume that loads do not conflict with earlier • stores – all loads and stores execute out of order • -- Naive Speculation • When there is a conflict, the load behaves like a • branch mispredict – all subsequent instructions • are squashed and re-fetched • Expensive – 30-cycle penalty • Rename checkpoints for all instructions • Re-execute only the dependent instructions? – more complex, better performance
Ideal Model • In the perfect model, loads only wait for conflicting • stores – no artificial dependences and no • memory-order violations
Store Sets Concept • For every load, keep track of all stores that it • has conflicted with in the past • A load does not issue if members of its store • set have not finished (dependences are introduced • at the time of dispatch) • The implementation is easy if • a load depends on only one store • a store is present in only one store set
Trivial Implementations • Execution time normalized to an ideal store set • implementation
Ideal Store Set Predictor • An occasional memory-order violation can • introduce many false dependencies – hence, • use saturating counters
Implementation Overview • Every ld/st depends on the last store in its set • Causes serialized stores and false dependences st st st st st
Store Set Implementation • Every load and store belong to one color – keep track of the • last writer for each color – mpreds can pose problems • Colors are merged as you discover m-o violations
Store Set Merging • Store set merging improves performance by 12% • Note that merging happens gradually – no need to • instantly correct all entries in the table
Design Details • Merging store sets • To deal with occasional dependences and conflicts • clear the table every million cycles • use saturating counters for each entry • The SSIT needs 4K entries and the LFST needs • 128 entries
Related Work • Store barrier cache: identify stores that are likely • to pose conflicts • Keep track of all store-load conflict pairs and • associatively check for dependences while • dispatching instructions
Next Week’s Paper • “Effective Hardware-Based Prefetching for • High-Performance Microprocessors”, T.F. Chen • and J.L. Baer, IEEE Transactions on Computers, • May 1995
Title • Bullet