230 likes | 340 Views
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems. Ayse Yilmazer, University of Rhode Island Resit Sendag, University of Rhode Island Joshua J. Yi, Freescale Semiconductor, Inc. . Motivation. Previous work on Wrong-path (WP) effects in Uniprocessors
E N D
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University of Rhode Island Joshua J. Yi, Freescale Semiconductor, Inc.
Motivation • Previous work on Wrong-path (WP) effects in Uniprocessors • Positive Effects: Prefetching • Up to 20% better performance for 181.mcf (SPECint 2000) • Negative Effects: Pollution • L1 and L2 cache pollution • Extra traffic • Important to simulate WP, especially for some applications • How about WP effects in Multiple-CMP systems?
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Wrong-path effects in SMPs – 0 / 4 • Broadcast (snoop)- and directory-based SMP systems • MSI, MOSI, MESI, MOESI cache coherence protocols • Same issues in uniprocessors apply • Pollution effect • Prefetching effect • Extra cache/memory traffic • In contrast to uniprocessor effects, WP cause: • Extra coherence traffic: • data, invalidations, write-backs, acknowledgements • Additional cache block state transitions
A speculatively replaces B Initial States A is a Wrong-path Block ! Wrong-path effects in SMPs – 1 / 4 • Replacements
Write-back dirty copy of B M -> S Write-back dirty copy of A Only for MESI (or MSI) Wrong-path effects in SMPs – 2 / 4 • Write-backs
P1 loses its write privileges for block A P1 asks for grant to write and sends invalidation Wrong-path effects in SMPs – 3 / 4 • Invalidations
Wrong-path effects in SMPs – 4 / 4 • Data/Bus and Coherence Traffic Increases • L1 references, • L2 references, • coherence traffic • snoop, directory requests for data and invalidations • Power Consumption Increases • Due to extra cache references, coherence traffic and cache block state transitions • Resource Contention • Competing with correct-path resources • In contrast to uniprocessors, the increase in the frequency of full service buffers • critical when many cache-to-cache transfers
WP effects in Multiple-CMPs – 0 / 2 • CMP node and a 4 CMP system • We studied inclusive L1 and L2 cache • L2 cache also tracks the coherence of cache blocks in L1
WP effects in Multiple-CMPs – 1 / 2 OIV SO S OIN I I State Transitions when replacement of an SO line in L2 cache
WP effects in Multiple-CMPs – 1 / 2 MO MT M S SO • State Transitions when an MT line in L2 cache receives a WP request
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Experimental Methodology • GEMS simulator – Wisconsin Multifacet Group • Based on Virtutech SIMICS • Aggressive out-of-order superscalar processor • Detailed Shared-Memory Model • We evaluate 16-processor (4 and 8-CMPs) SPARC V9 system running unmodified Solaris 9 • Evaluated 2-level MOSI directory coherence protocol • MOSI: Modified, Owned, Shared, Invalid • We track the speculatively generated memory references • and mark them as being on the wrong-path when the branch misprediction is known
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Evaluation Results 1 / 5 -- L1 and L2 Cache Traffic 4 CMPs 8 CMPs • Total memory references increase by 16% and 14% for 4- and 8-CMPs, respectively. • L2 cache references increase by 35% and 36%, respectively. • For em3d, the increase in the number of L1 misses increase as much as 70%.
Evaluation Results 2 / 5 -- Coherence Traffic 4 CMPs 8 CMPs • Internal -- 36% External -- 30%
Evaluation Results 3 / 5 -- L1 and L2 cache replacements • L1 -- 30%, L2 -- 17% • Potential Cache Performance Impact
Evaluation Results 4 / 5 -- Write Misses 4 CMPs 8 CMPs On average 7% On average 4%
Evaluation Results 5 / 5 -- Cache Line State Transitions 4 CMPs 8 CMPs • Internal: 2% to 13% • External: 1% to 9% • Internal: 2% to 17% • External: 1% to 10%
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Conclusion • It is important to model WP memory references in cache-coherent multi-CMP systems • For multi-CMPs, not only do the WP affect the performance of individual processors due to prefetching and pollution, they also affect the performance of the entire system by increasing • cache coherence transactions • cache block state transitions • write-backs • invalidations • resource contention • For a workload with many cache-to-cache transfers, WP can significantly affect coherence actions.
The End Thank You !