400 likes | 476 Views
Whose Cache Line Is It Anyway?. Operating System Support for Live Detection and Repair of False Sharing. Mihir Nanavati , Mark Spear, Nathan Taylor, Shriram Rajagopalan , Dutch T. Meyer, William Aiello, and Andrew Warfield University of British Columbia.
E N D
Whose Cache Line Is It Anyway? Operating System Support for LiveDetection and Repair of False Sharing MihirNanavati, Mark Spear, Nathan Taylor, ShriramRajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield University of British Columbia
Control VM (Dom0) Target System Xen + Hardware Memory
Dynamic Detection and Mitigation of False Sharing
T1 T2 Write 0x300 Write 0x308 Read 0x300 Cache 0x300 0x340 Main Memory
Cache Line C Structure With Padding With Allocator Metadata
Time (s) No. of Cores
Time (s) No. of Cores
Time (s) No. of Cores
Time (s) No. of Cores
Time (s) 7.5x No. of Cores Linux Kernel [OSDI ’10], JVM [Dice, 2012], Software Transactional Memory [HPCA ’06]
Dynamic Detection and Mitigation of False Sharing
Modify access locations Modify access frequency Sheriff [OOPSLA ’11]
T1 T2 Isolated Page Underlay Page
Dynamic Detection and Mitigation of False Sharing
Persistent, high-frequency false sharing
Very Fast and Imprecise Fast and Somewhat Precise Slow and Precise
Performance Counters Log Page Reads Instruction Emulation Log-Analysis Rules for remapper What are the byte ranges being accessed? What pages are involved in the contention? Does contention exist? Does this signify false sharing?
Dynamic Detection and Mitigation of False Sharing
T1 T2 Isolated Page Underlay Page
Don’t be Evil Harmful
?! It’s a Fault?! Original Code Code Cache
Original Code Code Cache
Catch all accesses via data path Avoid code trampolines Amortize page fault cost
T1 T2 Isolated Page Underlay Page
Remappings Established Progress (million records) 160 M/sec 110 M/sec Time (ms) Version with false sharing under Plastic Coherence Invalidations Source-fixed Version
CCBench Phoenix Parsec 5.4x Normalized Performance 3.6x 1.4x
Low overhead runtime detection Byte-granularity remapping Speedup of up to 5.4x
Performance Optimizations Security Enhancements