1 / 36

Predator : Predictive False Sharing Detection

Predator : Predictive False Sharing Detection . Tongping Liu* , Chen Tian , Ziang Hu , Emery Berger*. *University of Massachusetts Amherst Huawei US Research Center. Parallelism: Expectation is Awesome. Parallel Program.  Expectation. int count[8]; i nt W; void increment( int S)

milt
Download Presentation

Predator : Predictive False Sharing Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predator: Predictive False Sharing Detection Tongping Liu*, Chen Tian, ZiangHu, Emery Berger* • *University of Massachusetts Amherst Huawei US Research Center

  2. Parallelism: Expectation is Awesome Parallel Program  Expectation int count[8]; int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++; } int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i); } Runtime (s)

  3. Parallelism: Reality is Awful Parallel Program  Reality  Expectation int count[8]; int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++; } int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i); } False sharing Runtime (s) False sharing slows the program by 13X

  4. False Sharing in Real Applications False sharing slows MySQL by 50%

  5. False Sharing vs. True Sharing Cache Line

  6. False Sharing vs. True Sharing Task 3 Task 1 Task 1 False Sharing Task 2 Task 2 Task 4 True Sharing

  7. Resource Contention at Cache Line Level

  8. False Sharing Causes Performance Problems Core 2 Core 1 Thread 1 Thread 2 Invalidate Cache Cache Main Memory Cache line: basic unit of data transfer

  9. False Sharing Causes Performance Problems Core 2 Core 1 Thread 1 Thread 2 Invalidate Cache Cache Main Memory Interleaved accesses cause cache invalidations

  10. False Sharing is Everywhere me = 1; you = 1; // globals me = new Foo; you = new Bar; // heap class X { int me; int you; }; // fields array[me] = 12; array[you] = 13; // array indices

  11. False Sharing is Hard to Diagnose Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

  12. Problems of Existing Tools • No precise information/false positives • WIBA’09, VEE’11, EuroSys’13, SC’13 • Accurate & Precise • OOPSLA’11 ( Cannot detect read-write FS) Shared problem: only detect observed false sharing

  13. False Sharing Causes Performance Problems Core 1 Core 2 Interleaved accesses Task 1 Task 2 Cache invalidations Invalidate Cache Cache Performance problems Main Memory Detect false sharing causing performance problems Find cache lines with many cache invalidations

  14. Find Lines with Many Invalidations Memory: Global, Heap . . . . . . . …… Track cache invalidations on each cache line

  15. Track Cache Invalidations • Conservative Assumptions • Each thread runs on a different core with its private cache. • Infinite cache capacity. • Hardware-based approach • Needs hardware support • No portability • Simulation-based approach • Needs hardware info such as cache hierarchy, cache capacity • Very slow Predator: based on memory access history of each cache line

  16. Track Cache Invalidations Each Entry: { Thread ID, Access Type} 0 0 3 1 2 0 w w w r r r w r # of invalidations Time 0 0 0 0 0 0 T1 T2 T1 T1 T2 r w w r r T2 T2 w w T1 T2

  17. Predator Components Instruments every memory read/write access Compiler Instrumentation Collects memory accesses and reports false sharing Runtime System

  18. Detect Problems Correctly & Precisely • Correctly: • No false alarms Task 3 Task 1 Task 1 False Sharing Task 2 Task 4 Task 2 Track memory accesses on each word • Precisely • Global variables • Heap objects: pinpoint the line of memory allocation True Sharing

  19. Predator’s Report

  20. Why do we need prediction?

  21. Necessity of False Sharing Prediction Thread 1 Thread 2 Cache line 1 Cache line 1 Cache line 1 Cache line 2 Cache line 2 False Sharing False Sharing

  22. Properties Affecting False Sharing Occurrence • Change of memory layout • 32-bit platform  64-bit platform • Different memory allocator • Different compiler or optimization • Different allocation order by changing the code, e.g., printf • Runon hardware with different cache line size

  23. Example of False Sharing Sensitivity Cache line size = 64 bytes Memory Offset = 56 Offset = 8 Offset = 0 …… Colors represent threads

  24. Example of False Sharing Sensitivity Predator predicts false sharing problems without occurrence

  25. Prediction Based on Virtual Cache Lines Thread 1 Thread 2 Real case Virtual cache line 1 Cache line 1 Virtual cache line 1 Virtual cache line 2 Cache line 2 False Sharing Prediction 1 False Sharing Prediction 2

  26. Track Invalidations on Virtual Cache Lines X Y d • d < the cache line size - sz • (X, Y) from different threads && one of them is write Non-tracked virtual lines Tracked virtual line (sz-d)/2 (sz-d)/2

  27. Benchmark Results

  28. Real Applications Results • MySQL • Problem: False sharing occurs when different threads update the shared bitmap simultaneously. • Performance improves 180% after fixes. • Boost library: • Problem: “there will be 16 spinlocks per cache line” • Performance improves about 100%.

  29. Performance Overhead of Predator 5.6X

  30. Compiler Instrumentation Thread 1 Thread 2 Core 1 Core 2 Real case Runtime System Thread 1 Thread 2 Virtual cache line 1 Virtual cache line 1 Virtual cache line 2 Cache line 1 Cache line 2 False Sharing Prediction 1 Invalidate Cache Cache Precise report Main Memory False Sharing Prediction 2

  31. False Sharing is Hard to Diagnose Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

  32. Detailed Prediction Algorithm 1. Find suspected cache lines

  33. Detailed Prediction Algorithm 1. Find suspected cache lines 2. Track detailed memory accesses

  34. Detailed Prediction Algorithm d < sz && (X, Y) from different threads, potential false sharing X Y 1. Find suspected cache lines 2. Track detailed memory accesses d 3. Predict based on hot accesses

  35. 4: Tracking Cache • Invalidations on the Virtual Line X Y d Non-tracked virtual lines Tracked virtual line (sz-d)/2 (sz-d)/2

More Related