280 likes | 384 Views
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program. Duan Yuelu, Feng Xiaobing, Pen-chung Yew. Outline. Motivation Approach & Implementation Results Related Work Conclusion. Motivation.
E N D
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew
Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion
Motivation • Programmers develop “low-lock” code for better performance • lock is expensive • data race are deliberately employed • require sequential consistency (SC) model • Such code might fail in relaxed consistency (RC) models • E.g. Double Checked Locking (DCL) for lazy initialized singleton
Example 1 (a):Lazy initialized singleton Object::Object() { this.field = 100; } Object Object::getInstance() { if (!_instance) _instance = new Object(); return _instance; } • Object Object::getInstance() { • lock(l); • if (!_instance) • _instance = new Object(); • unlock(l); • return _instance; • } work for multi-thread, but is expensive... void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField(); } work only for single thread
Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) _instance = new Object(); unlock(l); } return _instance; } If the architecture is SC, then it works correctly, with better performance than (a). (b): Double Checked Locking for lazy initialized singleton But, how about running on RC models that allows write-write reorder?
A possible execution interleave…correct! Initializer Thread (T1) Reader Thread(T2) • Object Object::getInstance() { • if (!_instance) { • lock(l); • if (!_instance) { • temp = malloc(..); • A1: temp->field = 100; • A2: _instance = temp; • } • unlock(l); • } • return _instance; • } B1: if (!_instance) {…} … B2: read _instance->field; Data races are employed, since these accesses are improperly synchronized
But, how about reorder write-write? Initializer Thread (T1) Reader Thread(T2) Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; A2: _instance = temp; A1: temp->field = 100; } … B1: if (!_instance) {…} … B2: read _instance->field; Get Un-initialized value of instance->field Violate Sequential Consistency
bug pattern:Potential Violation of Sequential Consistency (PVSC),- since these defects might cause SC violation.How to detect and eliminate PVSC bugs?- Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.
Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion
our scheme • (1) Construct Race Graph • (2) Find cycles in it • A cycle in race graph corresponds to a PVSC bug • (3) Compute delay set • (4) Insert memory ordering fences
Constructing Race Graph • For all the instructions that executed in a particular execution of a program P: • Add program order edge for instructions in each thread. • Add race edge for each data race. Thread 1 Thread 2 wr a rd b Program order edge wr b rd a Race edge
lock(l); • if (!_instance) { • temp = malloc(..); • temp->field = 100; • _instance = temp; • } • unlock(l); • } if (!_instance) {…} … read _instance->field; A: wr a C: rd b Example 1. Race Graph for DCL … B: wr b D: rd a
Find cycles in race graph • Theorem 1. A cycle in race graph corresponds to a PVSC bug. • Proof: If a cycle is found in race graph, then it is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non-SC execution E={B->C, D->A} from the cycle A->B->C->D->A in previous example.
Compute delay set • Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir] • Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D. • Proof: Omitted
Insert memory ordering fences • A fence instruction delays the issue of an instruction until all previous instructions completed. • Insert a fence for each delay in D. • Then D can be enforced, and, • Detected PVSC can be eliminated.
Examples for above 3 steps… Thread 1 Thread 2 wr a rd a wr b rd b Fig. 1:No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don’t need fences.)
Thread 1 Thread 2 Thread 3 A: a=1 B: if (a) C: b = 1 D: if (b) Initially a = b = 0 E: R1=a Fig. 2:contains a cycle A->B->C->D->E->A, PVSC. It’s possible to get the execution {A->B, C->D,E->A} which violates SC and results in {a=1,b=1, R1=0}. If we insert fences between A and B, C and D, then PVSC is eliminated.
Fig. 3: Corrected version of DCL for lazy initialized singleton. • Object getInstance() { • Object *tmp = _instance; • Fence(); • if (!tmp) { • lock(l); • tmp = _instance; • if (!tmp) • tmp = new Object(); • Fence(); • _instance = tmp; • unlock(l); • } • return _instance; • }
Optimization • To handle real-world applications with • Long execution time • Many threads • We convert the race graph into PC race graph • Combine nodes with same PC into one node. • The graph contains N nodes, where N equals the number of race access instructions. • Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug • Can introduce false negatives.
Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion
Results • Detected PVSC bugs • Performance loss after fence insertion • Cost of PVSC detection over race detection
Performance loss of SPLASH-2 Figure 10: Performance on Intel Itanium SMP
Cost over data race detection Figure 13: Cost of PVSC detection over different race detecting algorithm
Related Work • Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue] • Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr] • Data race detection: Do not concern with the problem of SC violation. [many] • Other concurrency bugs: Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.
Outline • Motivation • Approach & Implementation • Results • Related Work • Conclusion
Conclusion • An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs. • Easy to be ported to the matured data race detection tools. • Retain the performance after PVSC elimination. • Scalable and low-cost. • Current limitation • Dynamic data race detection limitations: false positive and false negative. • Can be addressed with the progress in data race detection • Loop
Thanks! Suggestion?