280 likes | 426 Views
Memory Consistency Models (III). Relaxing all orders. Retain control and data dependences within each thread Why? allow multiple, overlapping read operations non-blocking read, lock-up free caches and speculative execution. Relaxing all orders. Weak ordering
E N D
Relaxing all orders • Retain control and data dependences within each thread • Why? • allow multiple, overlapping read operations • non-blocking read, lock-up free caches and speculative execution
Relaxing all orders • Weak ordering • synchronization operations wait for all previous mem ops to complete • arbitrary completion ordering between • Release Consistency • acquire: read operation to gain access to set of operations or variables • release: write operation to grant access to others • acquire must before following accesses • release must wait for preceeding accesses to complete
Preserved Orderings Weak Ordering Release Consistency read/write ° ° ° read/write read/write ° ° ° read/write Acquire 1 1 read/write ° ° ° read/write 2 Synch(R) read/write ° ° ° read/write 3 read/write ° ° ° read/write Release 2 Synch(W) read/write ° ° ° read/write 3
Relaxing All Program Orders • Weak Ordering Model (WO) • Release Consistency Models (RCsc/RCpc) • Three Models proposed for commercial architectures Digital Alpha SPARC V9 relaxed memory order (RMO) IBM PowerPC
Relaxing All Program Orders • All models allow a processor to read its own writes early • RCpc and IBM PowerPC are the only models which allow a read to return the value of another processor’s write early • Two types • WO, RCsc, and RCpc models distinguish memory operations based on their type, and provide stricter ordering constraints for some types of operations • Alpha, RMO, and PowerPC models provide explicit fence instructions for imposing pgm orders between various memory operations
Pn P2 P1 M1 M2 Mn Weak Ordering • Two types of operations • data and synchronization operations Program Order R R W W Rs Ws Rs Ws Rs Rs Ws Ws R W R W R R W W R W R W + coherence + write atomicity
Weak Ordering • All sub-operations of the first operation must complete before any sub-operations of the second operation to maintain the program order ( ) • Coherence requirement : all write sub-operations to the same location to complete in the same order across all memory copies • The read sub-operation and the sub-operations of the write (possibly from a different processor) whose value is returned by the read should complete before any of the sub-operations of the operation that follows the read in program order.( ) - Dubois et al. refer to this as the read being “globally performed”.
Weak Ordering • Conflicting op: • access the same mem location and at least one of them is write • Competing op: • If it is possible for them to appear next to each other in a sequentially consistent total order • Synchronized program • All competing memory ops have been labeled as syn operations conflicting P1 A = 1; B = 1; Flag =1 P2 *(while (Flag==0);) .. = Flag . . .. = Flag u=A v=B po competing non-competing co
shared competing non-competing sync (acq, rel) non-sync Pn P2 P1 M1 M2 Mn Release Consistency (RC) Program Order RCsc Rc Rc Wc Wc Rc Wc Rc Wc Racq Racq R W R W Wrel Wrel R R W W R W R W RCpc Rc Rc Wc Wc Rc Wc Rc Wc Racq Racq R W R W Wrel Wrel R R W W R W R W +coherence
DEC Alpha • impose ordering solely through explicit fence instructions • two types of fences • the memory barrier (MB) • the write memory barrier (WMB) • the only model that enforces the program order between reads to the same location • implementation can safely exploit optimizations such as read forwarding
P1 P2 Pn F2 F1 F1 F1 F1 M DEC Alpha Program Order R R W W W R W R R W R R W W R W R W
P1 P2 Pn F1 F2 F3 F4 M SPARC V9 Relaxed Memory Order (RMO) • Extension of the TSO and PSO models • four types of fences • a four-bit opcode Program Order R R W W R W R W R R W W R W R W
Pn P2 P1 F F F F M1 M2 Mn IBM PowerPC Model • multiple-copy semantics of memory to the programmer • the conceptual system is identical to that of WO • single fence instruction, called a SYNC Program Order R R W W R W R W R R W W R W R W + Coherence An approximate representation for the PowerPC model
Relationship among models SC IBM-370 TSO PC Stricter relation PSO WO Alpha RCSC RMO PowerPC RCPC
Complete Relaxed Consistency Model • System specification • what program orders among mem operations are preserved • what mechanisms are provided to enforce order explicitly, when desired • Programmer’s interface • what program annotations are available • what ‘rules’ must be followed to maintain the illusion of SC • Translation mechanism
Programmer’s Interface • weak ordering allows programmer to reason in terms of SC, as long as programs are ‘data race free’ • release consistency allows programmer to reason in terms of SC for “properly labeled programs” • lock is acquire • unlock is release • barrier is both • ok if no synchronization conveyed through ordinary variables
Identifying Synch events • two memory operation in different threads conflict ifthey access same location and one is write • two conflicting operations compete if one may follow the other in a SC execution with no intervening memory operations on shared data • a parallel program is synchronized if all competing memory operations have been labeled as synchronization operations • perhaps differentiated into acquire and release • allows programmer to reason in terms of SC, rather than underlying potential reorderings
p1 p2 write A write B test (flag) set (flag) test (flag) read A read B P.O. P.O. C.O. Example • Accesses to flag are competing • they constitute a Data Race • two conflicting accesses in different threads not ordered by intervening accesses • Accesses to A (or B) conflict, but do not compete • as long as accesses to flag are labeled as synchronizing
Properly Synchronized Programs • All synchronization operations explicitly identified • All data accesses ordered though synchronizations • no data races! => Compiler generated programs from structured high-level parallel languages
How should programs be labeled? • Data parallel statements a la HPF (Forall loop) • Library routines (lock, unlock, barrier) • Variable attributes for declarations • Annotations at the statement level
READ READ WRITE READ WRITE READ WRITE WRITE Properly-Labeled Model - Level One (PL1) READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE shared competing READ/WRITE ... READWRITE non-competing
PL2 shared competing non-competing sync (acq, rel) non-sync Labels a1,a2: non-sync c1,c2(test): sync c1c2(set): non-sync d1,d2: non-competing f1,f2: non-sync g1,g2: sync P1 P2 a1: u= Bound a2: u=Bound b1: if (local_bound<u){ b2: if (local_bound<u){ c1: while (test&set(L)==1); c2: while (test&set(L)==1); d1: u=Bound; d2: u=Bound e1: if (local_bound<u) e2: if (local_bound<u) f1: Bound = local_bound; f2: Bound = local_bound; g1: L = 0; g2: L = 0; h1: } h2:} ...... .......
test(READ) (sync) set(WRTIE) (non-sync) PL2 programs READ/WRITE .... READ/WRITE test(READ) (sync) set(WRTIE) (non-sync) READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE unset(WRITE) (sync) unset(WRITE) (sync) READ/WRITE .... READ/WRITE
Summary of Programmer Model • Contract between programmer and system: • programmer provides synchronized programs • system provides effective “sequential consistency” with more room for optimization • Allows portability over a range of implementations • Research on similar frameworks: • Properly-labeled (PL) programs - Gharachorloo 90 • Data-race-free (DRF) - Adve 90 • Unifying framework (PLpc) - Gharachorloo,Adve 92
Interplay of Micro and multi processor design • Multiprocessors tend to inherit consistency model from their microprocessor • MIPS R10000 -> SGI Origin: SC • PPro -> NUMA-Q: PC • Sparc: TSO, PSO, RMO • Can weaken model or strengthen it • As micros get better at speculation and reordering it is easier to provide SC without as severe performance penalties • speculative execution • speculative loads • write-completion (precise interrupts)