190 likes | 321 Views
Summary of Boehm’s “threads … as a library” + other thoughts and class discussions. CS 5966, Feb 4, 2009, Week 4. Assignment : Dining Phil code . Some versions of Dining Phil have data races What are races? Why are they harmful? Are they always harmful? P1 : temp = shared-x P2 : x = 1
E N D
Summary of Boehm’s“threads … as a library”+ other thoughts and class discussions CS 5966, Feb 4, 2009, Week 4
Assignment : Dining Phil code • Some versions of Dining Phil have data races • What are races? • Why are they harmful? • Are they always harmful? • P1 : temp = shared-x • P2 : x = 1 • versus • the same codes inside a single lock/unlock • In this case, the atomicity of the locations gives the same computational semantics • Be sure of the atomicity being assumed!
Why we should know memory models • Not very intuitive • Takes time to sink in • Something as important as this stays with one only through repeated exposures • Other classes do not give emphasis • They attempt to sweep things under the rug • They are playing ‘head in the sand’! • While it is like a grain of sand, its presence under the eye-lid or in a ball-bearing is what mem models are akin to… • This is dangerous! • Stifles understanding • We are in a world where even basic rules are being broken • Academia is about not buying into decrees • e.g. “goto”s always harmful?
Why we should know memory models • Clearly, success in multi-core programming depends on having high-level primitives • Unfortunately nobody has a clue as to which high level primitives “work” • are safe and predictable • are efficient • Offering an inefficient high-level primitive does more damage • People will swing clear back to a much lower primitive!
Why we should know memory models • Till we form a good shared understanding of which high level primitives work well, we must be prepared to evaluate the low level effects of existing high level primitives • The added surprises that compilers throw in can cause such non-intuitive outcomes that we had better know that they exist, and solve issues when they arise
Why we should know memory models • Locks are expensive • Performance and energy • If lock-free code works superbly faster, and there is an alternate (lock-free) reasoning to explain such behaviors, clearly one must entertain such thoughts • Need all tools in one’s kit • HW costs are becoming very skewed • Attend Uri Weiser’s talk Feb 12th • Finally, we need to understand what tools such as Inspect are actually doing!
Where mem models mattered • PCI bus ordering (producer/consumer broken) • Holzmann’s experience in multi-core SPIN • Our class experiments • OpenMPmem model in conflict with Gccmem model • In understanding architectural consequences • Hit-under-miss optimization in speculative execution (in snoopy busses such as HP Runway)
On “HW / SW” split • Till the dust settles (if at all) in multi-core computing, you had better be interested in HW and SW matters • HW matters • C-like low level behavior matters • Later we will learn whether “comfortable” abstractions such as C# / Java are viable • Of course when programming in the large, we will prefer such high level views; when understanding concepts, however, we need all the “nuts and bolts” exposed…
Boehm’s points • Threads are going to be increasingly used • We focus on languages such as C/C++ where threads are not built into the language – but are provided through add-on libraries • Ability to program in C/Pthreads comes through ‘painful experience’ – not through strict adherence to standards • This paper is an attempt to ameliorate that
Page 2: Thread lib, lang, compiler … • Thread semantics cannot be argued purely within the context of the libraries • They involve the • compiler semantics • language semantics (together the “software” or “language” mem model) • Disciplined use of concurrency thru thread APIs is OK for 98% of the users • But need to know the 2% uses outside .. esp in a world where we rely on MP systems for performance
P2 S3: Pthread Approach to Concur. • Seq consistency is the intuitive model • Too expensive to implement as such • x = 1 ; r1 = y; • y = 1 ; r2 = x; • final value of x=y=0 is allowed (and is what happens today) • Compilers may reorder subject to intra-thread dependencies • HW may reorder subject to intra-thread dependencies
P2 S3: Pthread silent on mem model semantics ; reasons: • Many don’t understand • So they preferred “simple rules” • Instead, it “decrees” : • Synchronize thread execution using mutex_lock , mutex_unlock • then it is expected that no two threads race on a single location • (Java is more precise even about racing semantics)
P2 S3: Pthread silent on mem model semantics ; reasons: • In practice, mutex_lock etc contain memory barries (fences) that prevent HW reordering around the call • Calls to mutex_lock etc treated as opaque function calls • No instructions can be moved across • If f() calls mutex_lock(), even f() is treated as such • Unfortunately, many real systems intentionally or unknowingly violate these rules
P4 S4: Correctness Issues • Consider this program • if (x==1) ++y; • if (y==1) ++x; • Is (x==1, y==1) acceptable? Is there a race? • Not under SC! • However if the compiler transforms the code to • ++y ; if (x != 1) –y; • ++x ; if (y != 1) –x; • then there is a race / x==1, y==1 is allowed… is a possible conclusion (or say the semantics are undefined)
P5 S4.2 Rewriting of adjacent data • Bugs of this type actually have arisen • struct (int a:17, int b:15} x • Now realise “x.a=42” as • {tmp =x; tmp &= ~0x1fff; tmp |= 42; x=tmp; } • Introduces an “unintended” write of b also! • OK for sequential • But in concurrent setting, a concurrent “b” update could now race !! • Race is not “seen” at source level!
P5 : another example • struct {char a; char b; … char h; } x • x.b = ‘b’; x.c=‘c’; … ; x.h = ‘h’; can be realized as • x = ‘hgfedcb\0’ | x.a • Now if you protect “a” with one lock and “b thru h” with another lock” , you are hosed – there is a data race! • C should define when adjacent data may be over-written
P5/P6 : register promotion • Compilers must be aware of existence of threads • Consider the code optimized to speed up for serial case • for(..){ • if (mt) lock(…); • x = ..x… • if (mt) unlock(..); • }
P5/P6 : register promotion • for(..){ • if (mt) lock(…); • x = ..x… • if (mt) unlock(..); } • can be optimized according to Pthread rules to • r=x; for(…) {.. • if (mt) { x=r; lock(…); r=x; } • r = …x…; • if (mt) { x=r; unlock(..); r=x; } } • x=r; • Fully broken – reads/writes to x without holding lock!
avoiding expensive synch. • for(mp = start; mp<10^4; ++mp) • if (!get(mp)) { • for (mult=mp; mult<10^8; mult +=mp) • if (!get(mult)) set(mult) • Sieve algo • Benefits from races !!