230 likes | 446 Views
Consistency. Models of computation. Coherence vs. consistency. coherence deals with accesses to the same memory location consistency addresses the possible outcomes from legal orderings to all memory locations
E N D
Consistency Models of computation
Coherence vs. consistency • coherence deals with accesses to the same memory location • consistency addresses the possible outcomes from legal orderings to all memory locations • common model (sequential consistency) is easy to understand but is difficult to implement, and has poor performance
What do you expect? • Sequential consistency: “Commit results in processor order” • simple enough in a uniprocessor • similarly with context switching: just save and restore state • what about multi-threading, or multiprocessor machines?
MIPS R10000 • issue instructions out of order • in-order commit • speculative loads may execute and pass a value for modification long before the load commits in program order • meantime, some other processor may commit a store to that location
Producer - consumer P1 P2 write (A) ; while(flag != 1) ; flag := 1 ; read (A); • assumes P1’s writes become visible to P2 in program order
One or both proceed P1 P2 X := 0 ; Y := 0 ; ... ... if (Y == 0) kill P2; if (X == 0) kill p1 ; • it’s a race through the critical section
Sequential consistency • results can be mapped to some sequential execution where the instructions of each process appear in that program order equivalently: • memory operations proceed in program order • all writes are atomic and become visible to all processors at the same time
The need to relax • strict sequential consistency has severe performance drawbacks, so: • keep sequential consistency, and use prefetch and speculation, or • relax the consistency model – and be prepared to think carefully about programs
Attributes of consistency models • system specification • which orders are preserved, and which are not? is there system support to enforce a particular order? • programmer interface: the set of rules that will lead to the expected execution • translation mechanism: how to translate program annotations to hardware actions
Alternative 1 • total store ordering: allows a read to bypass an earlier incomplete write • helps hide write latency • can be provided by fence instructions • SPARC v9 provides various memory barrier instructions
Alternative 2 • partial store ordering: allow writes as well as reads to bypass writes • writes cannot bypass reads • writes are still atomic • very different from sequential consistency • e.g. spinning on a flag doesn’t work • needs a store barrier instruction to emulate sequential consistency
Alternative 3 • processor consistency: same as total store ordering, but does not guarantee atomic writes • implemented in recent Intel processors
Weak ordering • just try to preserve data and control dependencies within a process • don’t worry about the order of memory operations between synchronization points • e.g. don’t worry about the exact order of independent reads and writes within a critical section
Weak ordering • code from outside (before or after) a critical section cannot be reordered with code inside it • code before a barrier must commit before entering, code after a barrier must not issue until the barrier is left • code before a flag wait must commit before waiting, and code after must not issue before flag is set by the producer • code before setting of a flag must commit first, and code after must not issue before the flag is set
Weak ordering • a good match to modern CPUs and aggressive compiler optimizations • hardware must recognize synchronization, or compiler must insert proper barriers • MIPR R10000 provides sync instruction and fence count register • sync disables issue until fence register is zero and all outstanding memory operations have committed • fence count incremented on an L2 miss and decremented on a reply
Release consistency • relax weak ordering further • categorize all synchronization operations as either acquire or release • acquire is a read (load) on a protected variable, like a lock or a waiting on a flag • release is a write (store) granting access to others, like unlock or setting a flag • barrier is release (arrival) and acquire (departure)
In practice • MIPS processors are sequentially consistent • Sun supports total or partial store ordering • Intel supports processor consistency • Alpha and PowerPC support weak ordering; Power4 and Power5 do not guarantee atomic writes
Processor consistency • a simple model with good performance • writes must become visible to all processors in program order • loads can bypass writes
Back to our examples Under these rules, • does producer-consumer work? • does one-or-both work?
Results under processor consistency • producer-consumer is okay because P1’s actions are both writes and they must become visible sequentially • one-or-both can break because loads can bypass writes • if (X == 0) is a load • Y = 0 is a write
Intel Itanium • loads are not reordered with other loads • stores are not reordered with other stores • stores are not reordered with older loads • stores to the same location have a total order • a load may be reordered with an older store to a different location
Itanium example 1 • initially, x=y=0 P1 P2 R1 <- x R2 <- y (loads) y <- 1 x <- 1 (stores) • we will never see R1 = R2 = 1 because stores are not reordered with older loads
Itanium example 2 • initially, x=y=0 • P1 P2 • x <- 1 y <- 1 (stores) • R1 <- y R2 <- x (loads) • we may see R1 = R2 = 0 because loads may be reordered with older stores