1 / 28

Duan Yuelu, Feng Xiaobing, Pen-chung Yew

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program. Duan Yuelu, Feng Xiaobing, Pen-chung Yew. Outline. Motivation Approach & Implementation Results Related Work Conclusion. Motivation.

claire
Download Presentation

Duan Yuelu, Feng Xiaobing, Pen-chung Yew

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew

  2. Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion

  3. Motivation • Programmers develop “low-lock” code for better performance • lock is expensive • data race are deliberately employed • require sequential consistency (SC) model • Such code might fail in relaxed consistency (RC) models • E.g. Double Checked Locking (DCL) for lazy initialized singleton

  4. Example 1 (a):Lazy initialized singleton Object::Object() { this.field = 100; } Object Object::getInstance() { if (!_instance) _instance = new Object(); return _instance; } • Object Object::getInstance() { • lock(l); • if (!_instance) • _instance = new Object(); • unlock(l); • return _instance; • } work for multi-thread, but is expensive... void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField(); } work only for single thread

  5. Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) _instance = new Object(); unlock(l); } return _instance; } If the architecture is SC, then it works correctly, with better performance than (a). (b): Double Checked Locking for lazy initialized singleton But, how about running on RC models that allows write-write reorder?

  6. A possible execution interleave…correct! Initializer Thread (T1) Reader Thread(T2) • Object Object::getInstance() { • if (!_instance) { • lock(l); • if (!_instance) { • temp = malloc(..); • A1: temp->field = 100; • A2: _instance = temp; • } • unlock(l); • } • return _instance; • } B1: if (!_instance) {…} … B2: read _instance->field; Data races are employed, since these accesses are improperly synchronized

  7. But, how about reorder write-write? Initializer Thread (T1) Reader Thread(T2) Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; A2: _instance = temp; A1: temp->field = 100; } … B1: if (!_instance) {…} … B2: read _instance->field; Get Un-initialized value of instance->field Violate Sequential Consistency

  8. bug pattern:Potential Violation of Sequential Consistency (PVSC),- since these defects might cause SC violation.How to detect and eliminate PVSC bugs?- Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.

  9. Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion

  10. our scheme • (1) Construct Race Graph • (2) Find cycles in it • A cycle in race graph corresponds to a PVSC bug • (3) Compute delay set • (4) Insert memory ordering fences

  11. Constructing Race Graph • For all the instructions that executed in a particular execution of a program P: • Add program order edge for instructions in each thread. • Add race edge for each data race. Thread 1 Thread 2 wr a rd b Program order edge wr b rd a Race edge

  12. lock(l); • if (!_instance) { • temp = malloc(..); • temp->field = 100; • _instance = temp; • } • unlock(l); • } if (!_instance) {…} … read _instance->field; A: wr a C: rd b Example 1. Race Graph for DCL … B: wr b D: rd a

  13. Find cycles in race graph • Theorem 1. A cycle in race graph corresponds to a PVSC bug. • Proof: If a cycle is found in race graph, then it is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non-SC execution E={B->C, D->A} from the cycle A->B->C->D->A in previous example.

  14. Compute delay set • Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir] • Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D. • Proof: Omitted

  15. Insert memory ordering fences • A fence instruction delays the issue of an instruction until all previous instructions completed. • Insert a fence for each delay in D. • Then D can be enforced, and, • Detected PVSC can be eliminated.

  16. Examples for above 3 steps… Thread 1 Thread 2 wr a rd a wr b rd b Fig. 1:No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don’t need fences.)

  17. Thread 1 Thread 2 Thread 3 A: a=1 B: if (a) C: b = 1 D: if (b) Initially a = b = 0 E: R1=a Fig. 2:contains a cycle A->B->C->D->E->A, PVSC. It’s possible to get the execution {A->B, C->D,E->A} which violates SC and results in {a=1,b=1, R1=0}. If we insert fences between A and B, C and D, then PVSC is eliminated.

  18. Fig. 3: Corrected version of DCL for lazy initialized singleton. • Object getInstance() { • Object *tmp = _instance; • Fence(); • if (!tmp) { • lock(l); • tmp = _instance; • if (!tmp) • tmp = new Object(); • Fence(); • _instance = tmp; • unlock(l); • } • return _instance; • }

  19. Optimization • To handle real-world applications with • Long execution time • Many threads • We convert the race graph into PC race graph • Combine nodes with same PC into one node. • The graph contains N nodes, where N equals the number of race access instructions. • Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug • Can introduce false negatives.

  20. Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion

  21. Results • Detected PVSC bugs • Performance loss after fence insertion • Cost of PVSC detection over race detection

  22. Part of detected bugs

  23. Performance loss of SPLASH-2 Figure 10: Performance on Intel Itanium SMP

  24. Cost over data race detection Figure 13: Cost of PVSC detection over different race detecting algorithm

  25. Related Work • Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue] • Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr] • Data race detection: Do not concern with the problem of SC violation. [many] • Other concurrency bugs: Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.

  26. Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion

  27. Conclusion • An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs. • Easy to be ported to the matured data race detection tools. • Retain the performance after PVSC elimination. • Scalable and low-cost. • Current limitation • Dynamic data race detection limitations: false positive and false negative. • Can be addressed with the progress in data race detection • Loop

  28. Thanks! Suggestion?

More Related